I am using scrapy to extract data from web. I am trying to extract the text of anchor tags under a span tag as shown below:

<span>.....</span> <span id = "size_selection_list"> <a>....</a> <a>....</a> . . . <a> </span>

I am using the following xpath logic:

t = sel.xpath('//div[starts-with(@id,"size_selection_container")]/span[2]') for x in t.xpath('.//a'): ....

The problem is that the span element is reached but the <a> tags are not iterated. What is the mistake here? Also the <a> has an href which has javascript. Is this the reason for the problem?

If I would you I would use requests and BeautifulSoup4 .

Please note, this code is untested, but it should work.

import requests from bs4 import BeautifulSoup r = requests.get(yoururlhere).text soup = BeautifulSoup(r, 'html.parser') #You can use LXML or other things, I am using the standard parser for compatibility span = div.find('div', {'class': 'theclass'} tags = span.findAll('a', href=True) for i in tags: print(i.getText()) #getText might not be a function, consider removing the extra () print(i['href']) #<-- This is the links, above is the text

I hope this works, please let me know

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

代码区博客精选文章
分页:12
转载请注明
本文标题:Scrap text in any &amp; lt&amp;semi; A &amp; gt&amp;semi; Labels under a ...
本站链接:https://www.codesec.net/view/611117.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(67)