python Beautiful Soup库
2015-09-07 00:00
471 查看
Beautiful Soup -HTML和XML的解析器
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
apt-get install python-bs4
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
apt-get install python-bs4
html_doc = """<html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p>""" from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') print(soup.prettify())
soup.title # <title>The Dormouse's story</title> soup.title.name # u'title' soup.title.string # u'The Dormouse's story' soup.title.parent.name # u'head' soup.p # <p class="title"><b>The Dormouse's story</b></p> soup.p['class'] # u'title' soup.a #<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] soup.find(id="link3") # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
相关文章推荐
- python语法
- python 库文件
- Python 定时器 timer
- Python 图像操作
- 详细记录python的range()函数用法
- leetcode Find Minimum in Rotated Sorted Array II python
- type对象及内置对象陷阱
- 零基础学python-11.5 真值测试与if...else...三元表达式
- 零基础学python-11.5 真值测试与if...else...三元表达式
- python风味之大杂烩
- Python 系统管理利器Fabric
- python风味之list创建
- Python os.system
- python统计日志ip
- python学习笔记---类的方法与普通方法的区别
- python scrapy爬虫
- 零基础学python-11.4 语句分隔符
- 零基础学python-11.4 语句分隔符
- 通过python 运行hadoop
- python__魔法方法