您的位置:首页 > 编程语言 > Python开发

beautifulsoup,python3中的爬虫匹配神器

2015-05-17 20:19 791 查看
相比起正则表达式而言,beautifulsoup在网页解析方面要好的多,基于标签,可以很快的获取内容,方便快速。

实例代码如下(运行正常):

# -*-coding:utf-8 -*-

import requests

from bs4 import BeautifulSoup

url = """

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p class="story">...</p>

"""

soup = BeautifulSoup(url)

print(soup.prettify())   #打印url内容,格式化输出

#四大对象之一---------Tag

print(soup.title)

print(soup.head)

print(soup.a)

print(soup.p)

print(soup.p.name)

print(soup.p.attrs)

print(soup.p['class'])

print(type(soup.a))

#四大对象之一-----NavigableString

#获得标签内部的文字

print(soup.p.string)

#获得多个标签内部的文字

for i in soup.stripped_strings:
print(repr(i))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息