您的位置:首页 > Web前端 > CSS

beautifulsoup4教程(四)css选择器

2019-02-02 12:32 176 查看

beautifulsoup4教程(一)基础知识和第一个爬虫

beautifulsoup4教程(二)bs4中四大对象

beautifulsoup4教程(三)遍历和搜索文档树

beautifulsoup4教程(四)css选择器

六、CSS选择器

6.1 通过标签名查找
print soup.select('title')
print soup.select('a')
print soup.select('b')

result:
[<title>The Dormouse's story</title>]
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<b>The Dormouse's story</b>]
6.2 通过类名查找
print soup.select('.story')

result:
[<p class="story">Once upon a time there were three little sisters; and their names were\n<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,\n<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and\n<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;\nand they lived at the bottom of a well.</p>, <p class="story">...</p>]
6.3 通过id名查找
print soup.select('#link1')

result:
print soup.select('#link1')
6.4 组合查找

多个过滤条件需要用空格隔开,从前往后是逐层筛选,选择器作用的不是 同一个结点。

print soup.select('p #link1')
print soup.select('a #link1')

result:
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]
[]

通过下面这种方式会更好理解

print soup.select('p >#link1')
print soup.select('a >#link1')

result:
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]
[]
6.5 属性查找
print soup.select('p >a')
print soup.select('p >a[href="http://example.com/tillie"]')

result:
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
6.6 列表迭代
  • 通过上述方法返回的都是列表,是可迭代对象。
print soup.select('p >a')
print type(soup.select('p >a'))
print "===="
print soup.select('p >a')[0]
print "===="
for a in soup.select('p >a'):
print a

result:
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
<type 'list'>
====
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
====
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: