未加星标

Python xpath space not be deleted

字体大小 | |
[开发(python) 所属分类 开发(python) | 发布者 店小二04 | 时间 2018 | 作者 红领巾 ] 0人收藏点击收藏

i am trying to get some data from a html page that has tables in it. i got a list of rows ugin xpath and now i am trying to get text() inside each element of td inside tr , here is the basic structure of tr

<tr> <td> <a href="#" onclick="WhoisOrderDomain('bank'); return false;"> SHOP </a> </td> <td>COUNTRY</td> <td class="text-right">1 038,00 USD</td> <td class="text-right">899,00 USD</td> <td class="text-right">899,00 USD</td> <td class="text-center"> <a class="btn btn-sm btn-info" href="#" onclick="WhoisOrderDomain('bank'); return false;"><i class="fa fa-shopping-cart"></i> Order</a> </td> </tr>

below is my xpath in python:

td_xpath = XPath("./td/a/text()[normalize-space()] | ./td/text()[normalize-space()]")

and i am getting this output:

['\r\n SHOP\r\n ', 'COUNTRY', '1038,00 USD', '899,00 USD', '899,00 USD', ' Order']

why spaces are not removed from first element?

also how to use xpath to remove ',' and 'USD' from prices?

[td.xpath('normalize-space()')for td in tree.xpath('//tr/td')]

out:

['SHOP', 'COUNTRY', '1 038,00 USD', '899,00 USD', '899,00 USD', 'Order'] [normalize-space()] is a filter that get rid of the empty string. if you need to get the string under a tag, use normalize-space(tag)

use strip or replace to get rid of USD

[td.xpath('normalize-space()').strip(' USD') for td in tree.xpath('//tr/td')]

out:

['HOP', 'COUNTRY', '1 038,00', '899,00', '899,00', 'Order']

EDIT:

tree.xpath('//tr/td//text()')

out:

['\n ', # empty, discard ' SHOP\n ', '\n ', # empty, discard 'COUNTRY', '1 038,00 USD', '899,00 USD', '899,00 USD', '\n', # empty, discard ' Order', '\n '] # empty, discard If [normalize-space()] affect string, you output ' Order' will not contains the whitespace at the begaining. [] will only act like boolean value to filter false value, it will not change the value.

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

tags: td,gt,lt,USD,xpath,text,normalize,tr,class,Order
分页:12
转载请注明
本文标题:Python xpath space not be deleted
本站链接:https://www.codesec.net/view/597050.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(10)