您的位置：首页 > 编程语言 > Python开发

BeautifulSoup_python3

2016-04-19 17:15 363 查看

1.错误排除

bsObj = BeautifulSoup(html.read())

报错：

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

解决办法：

bsObj = BeautifulSoup(html.read(),"html.parser")

BeautifulSoup

简介：通过定位HTML标签来格式化和组织复杂的网络信息，用简单的python对象来展现XML结构信息。

python3 安装版本4 BeautifulSoup4 （BS4）

运行实例：

#!/usr/bin/env python
# encoding: utf-8
"""
@author: 侠之大者kamil
@file: beautifulsoup.py
@time: 2016/4/19 16:36
"""
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen('http://www.cnblogs.com/kamil/')
print(type(html))
bsObj = BeautifulSoup(html.read(),"html.parser") #html.read() 获取网页内容，并且传输到BeautifulSoup 对象。
print(type(bsObj))
print(bsObj.h1)

第12 行注意，需要加上 "html.parser"

结果：

ssh://kamil@xzdz.hk:22/usr/bin/python3 -u /home/kamil/windows_python3/python3/Day11/day12/beautifulsoup.py
<class 'http.client.HTTPResponse'>
<class 'bs4.BeautifulSoup'>
<h1><a class="headermaintitle" href="http://www.cnblogs.com/kamil/" id="Header1_HeaderTitle">侠之大者kamil</a></h1>

Process finished with exit code 0

官方文档

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航