您的位置：首页 > 编程语言 > Python开发

Python爬虫 - Beautiful Soup4（一）-本地文件爬取

2017-11-22 19:04 330 查看

1.Beautiful Soup4 安装(简称BS4)

pip
或者 easy_install 安装：

easy_installbeautifulsoup4
pipinstallbeautifulsoup4

2.HTML解析器安装

解析器类型有：html.parser（python自带），lxml，html5lib
等
pip或者easy_install安装lxml：

easy_installlxml
pipinstalllxml

3.Beautiful Soup4使用

https://beautifulsoup.readthedocs.io/zh_CN/latest/#replace-with

新建python.py 输入如下内容：执行

from bs4 import BeautifulSoup

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.



...

"""

#default slove "html.parser"

#read string-html

soup = BeautifulSoup(html_doc, "html.parser")

#read local-html

#soup = BeautifulSoup(open('index.html'), "lxml")

#read net-html

#html_doc

#print (soup.prettify())

#document

print (soup.name)

#<title>The Dormouse's story</title>

print (soup.title)

#<class 'bs4.element.Tag'>

print (type(soup.title))

#title

print (soup.title.name)

#The Dormouse's story

print (soup.title.string)

#<class 'bs4.element.NavigableString'>

print (type(soup.title.string))

#Hey, buddy. Want to buy a used parser?

print (soup.font.string)

#<class 'bs4.element.Comment'>

print (type(soup.font.string))

#{'href': 'http://example.com/elsie', 'class': ['sister'], 'id': 'link1'}

print (soup.a.attrs)

#http://example.com/elsie

print (soup.a["href"])

# all <a>

print (soup.find_all("a"))

#children node

print (soup.html.contents)

#children node size

print (len(soup.html.contents))

# first children node

print (soup.html.contents[0])

#

for child in soup.html.children:

print (child)

#

for child in soup.html.descendants:

print (child)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航