您的位置：首页 > 编程语言 > Python开发

练习008-009

2016-05-06 15:04 477 查看

第 0008 题：一个HTML文件，找出里面的正文。

第 0009 题：一个HTML文件，找出里面的链接。

使用的BeautifulSoup来完成的，只需要调用方法就可以，比较方便

程序如下：

#!/usr/bin python
#coding:utf-8
from bs4 import BeautifulSoup
html='''
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
'''

soup = BeautifulSoup(html)

print soup.get_text()

for i in soup.findAll('a'):
print i.get('href')

感兴趣的可以看看这个文档

BeautifulSoup4.2.0文档

（写于2016年5月6日，http://blog.csdn.net/bzd_111）

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python

相关文章推荐

新的分享

章节导航