您的位置：首页 > 编程语言 > Python开发

beautifulsoup,python3中的爬虫匹配神器

2015-05-17 20:19 791 查看

相比起正则表达式而言，beautifulsoup在网页解析方面要好的多，基于标签，可以很快的获取内容，方便快速。

实例代码如下（运行正常）：

# -*-coding:utf-8 -*-

import requests

from bs4 import BeautifulSoup

url = """

<html><head><title>The Dormouse's story</title></head>

<body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1"></a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.

...

"""

soup = BeautifulSoup(url)

print(soup.prettify()) #打印url内容，格式化输出

#四大对象之一---------Tag

print(soup.title)

print(soup.head)

print(soup.a)

print(soup.p)

print(soup.p.name)

print(soup.p.attrs)

print(soup.p['class'])

print(type(soup.a))

#四大对象之一-----NavigableString

#获得标签内部的文字

print(soup.p.string)

#获得多个标签内部的文字

for i in soup.stripped_strings:
print(repr(i))

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 爬虫 python string 对象正则表达式

相关文章推荐

新的分享

章节导航