代码区项目交易流程

Find all elements based on the Namespaced attribute


If I have something like this:

<p>blah</p> <p foo:bar="something">blah</p> <p foo:xxx="something">blah</p>

How would I get beautifulsoup to select elements with an attribute of the foo namespace?

E.g. I would like the 2nd and 3rd p elements returned.

From the documentation :

Beautiful Soup provides a special argument called attrs which you can use in these situations. attrs is a dictionary that acts just like the keyword arguments:

soup.findAll(id=re.compile("para$")) # [<p id="firstpara" align="center">This is paragraph <b>one</b>.</p>, # <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>] soup.findAll(attrs={'id' : re.compile("para$")}) # [<p id="firstpara" align="center">This is paragraph <b>one</b>.</p>, # <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>]

You can use attrs if you need to put restrictions on attributes whose names are python reserved words, like class, for, or import; or attributes whose names are non-keyword arguments to the Beautiful Soup search methods: name, recursive, limit, text, or attrs itself.

from BeautifulSoup import BeautifulStoneSoup xml = '<person name="Bob"><parent rel="mother" name="Alice">' xmlSoup = BeautifulStoneSoup(xml) xmlSoup.findAll(name="Alice") # [] xmlSoup.findAll(attrs={"name" : "Alice"}) # [parent rel="mother" name="Alice"></parent>]

So for your given example:

soup.findAll(attrs={ "foo" : re.compile(".*") }) # or soup.findAll(attrs={ re.compile("foo:.*") : re.compile(".*") })

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

点击收藏

LAST Python 中的 urlencode 和 urldecode 操作 Python数据可视化:25年GDP之变 NEXT