beautifulsoup,python3中的爬虫匹配神器
2015-05-17 20:19
791 查看
相比起正则表达式而言,beautifulsoup在网页解析方面要好的多,基于标签,可以很快的获取内容,方便快速。
实例代码如下(运行正常):
# -*-coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
url = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(url)
print(soup.prettify()) #打印url内容,格式化输出
#四大对象之一---------Tag
print(soup.title)
print(soup.head)
print(soup.a)
print(soup.p)
print(soup.p.name)
print(soup.p.attrs)
print(soup.p['class'])
print(type(soup.a))
#四大对象之一-----NavigableString
#获得标签内部的文字
print(soup.p.string)
#获得多个标签内部的文字
for i in soup.stripped_strings:
print(repr(i))
实例代码如下(运行正常):
# -*-coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
url = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(url)
print(soup.prettify()) #打印url内容,格式化输出
#四大对象之一---------Tag
print(soup.title)
print(soup.head)
print(soup.a)
print(soup.p)
print(soup.p.name)
print(soup.p.attrs)
print(soup.p['class'])
print(type(soup.a))
#四大对象之一-----NavigableString
#获得标签内部的文字
print(soup.p.string)
#获得多个标签内部的文字
for i in soup.stripped_strings:
print(repr(i))
相关文章推荐
- python3实现网络爬虫(3)--BeautifulSoup使用(2)
- python库学习笔记——爬虫常用的BeautifulSoup的介绍
- python爬虫-html解析器beautifulsoup
- python学习之爬虫:BeautifulSoup
- python爬虫 - BeautifulSoup(2)子孙节点(.children .descendants)和父节点(.parents)
- Python爬虫实战四 | 盘搜搜1.2-网盘搜索神器开源
- Python爬虫-BeautifulSoup4 库的一些用法
- 第三课 Python爬虫Beautifulsoup4模块的使用
- 有意思的python爬虫系列(beautifulSoup,urllib,selenium)
- Python爬虫之美味鸡汤-BeautifulSoup
- [python爬虫] BeautifulSoup和Selenium简单爬取知网信息测试
- Python爬虫包 BeautifulSoup 学习(一) 简介与安装
- Python爬虫小实践:使用BeautifulSoup+Request爬取CSDN博客的个人基本信息
- python爬虫实现获取豆瓣图书的top250的信息-beautifulsoup实现
- Python爬虫--beautifulsoup 4 用法
- python3实现网络爬虫(4)--BeautifulSoup使用(3)
- Python:的web爬虫实现及原理(BeautifulSoup工具)
- python3实现网络爬虫(6)--正则表达式和BeautifulSoup配合使用
- Python--python爬虫神器PyQuery
- python爬虫之BeautifulSoup