代码区项目交易流程

Python xpath space not be deleted


i am trying to get some data from a html page that has tables in it. i got a list of rows ugin xpath and now i am trying to get text() inside each element of td inside tr , here is the basic structure of tr

<tr> <td> <a href="#" onclick="WhoisOrderDomain('bank'); return false;"> SHOP </a> </td> <td>COUNTRY</td> <td class="text-right">1 038,00 USD</td> <td class="text-right">899,00 USD</td> <td class="text-right">899,00 USD</td> <td class="text-center"> <a class="btn btn-sm btn-info" href="#" onclick="WhoisOrderDomain('bank'); return false;"><i class="fa fa-shopping-cart"></i> Order</a> </td> </tr>

below is my xpath in python:

td_xpath = XPath("./td/a/text()[normalize-space()] | ./td/text()[normalize-space()]")

and i am getting this output:

['\r\n SHOP\r\n ', 'COUNTRY', '1038,00 USD', '899,00 USD', '899,00 USD', ' Order']

why spaces are not removed from first element?

also how to use xpath to remove ',' and 'USD' from prices?

[td.xpath('normalize-space()')for td in tree.xpath('//tr/td')]

out:

['SHOP', 'COUNTRY', '1 038,00 USD', '899,00 USD', '899,00 USD', 'Order'] [normalize-space()] is a filter that get rid of the empty string. if you need to get the string under a tag, use normalize-space(tag)

use strip or replace to get rid of USD

[td.xpath('normalize-space()').strip(' USD') for td in tree.xpath('//tr/td')]

out:

['HOP', 'COUNTRY', '1 038,00', '899,00', '899,00', 'Order']

EDIT:

tree.xpath('//tr/td//text()')

out:

['\n ', # empty, discard ' SHOP\n ', '\n ', # empty, discard 'COUNTRY', '1 038,00 USD', '899,00 USD', '899,00 USD', '\n', # empty, discard ' Order', '\n '] # empty, discard If [normalize-space()] affect string, you output ' Order' will not contains the whitespace at the begaining. [] will only act like boolean value to filter false value, it will not change the value.

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

点击收藏

LAST Cython, pybind11, cffi which tool should you choose? Python 3 in 60 Minutes Youtube Series NEXT