I am using scrapy to extract data from web. I am trying to extract the text of anchor tags under a span tag as shown below:

<span>.....</span> <span id = "size_selection_list"> <a>....</a> <a>....</a> . . . <a> </span>

I am using the following xpath logic:

t = sel.xpath('//div[starts-with(@id,"size_selection_container")]/span[2]') for x in t.xpath('.//a'): ....

The problem is that the span element is reached but the <a> tags are not iterated. What is the mistake here? Also the <a> has an href which has javascript. Is this the reason for the problem?

If I would you I would use requests and BeautifulSoup4 .

Please note, this code is untested, but it should work.

import requests from bs4 import BeautifulSoup r = requests.get(yoururlhere).text soup = BeautifulSoup(r, 'html.parser') #You can use LXML or other things, I am using the standard parser for compatibility span = div.find('div', {'class': 'theclass'} tags = span.findAll('a', href=True) for i in tags: print(i.getText()) #getText might not be a function, consider removing the extra () print(i['href']) #<-- This is the links, above is the text

I hope this works, please let me know

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

本文标题:Scrap text in any &amp; lt&amp;semi; A &amp; gt&amp;semi; Labels under a ...

技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(67)