您的位置:首页 > 编程语言 > Python开发

Python 超级简单的网站html分析框架BeautifulSoup

2018-02-03 09:53 926 查看

Python 超级简单的网站爬取数据框架BeautifulSoup

案例

比如:我们要爬取这个 网站 的左侧栏所有的列表数据,如图所示



我们分析下这个网页的左侧栏的html结构,如图



结果发现就在id为leftcolumn下搜有的a标签,那么python代码该如何写了?

# coding: utf-8

import urllib2
from bs4 import BeautifulSoup

url_request = urllib2.urlopen('http://www.runoob.com/python/python-tutorial.html')
html_doc = url_request.read().decode('utf-8', 'ignore')

soup = BeautifulSoup(html_doc, 'html.parser')
# print(soup.prettify())

anchor_list = soup.find(id='leftcolumn').find_all('a')
for anchor in anchor_list:
astring = "title: " + anchor.get('title') + ", href=http://www.runoob.com/" + anchor.get('href')
print(astring)


输出的结果是:

title: Python 基础教程, href=http://www.runoob.com//python/python-tutorial.html
title: Python 简介, href=http://www.runoob.com//python/python-intro.html
title: Python 环境搭建, href=http://www.runoob.com//python/python-install.html
title: Python 中文编码, href=http://www.runoob.com/python-chinese-encoding.html
title: Python 基础语法, href=http://www.runoob.com//python/python-basic-syntax.html
title: Python 变量类型, href=http://www.runoob.com//python/python-variable-types.html
title: Python 运算符, href=http://www.runoob.com//python/python-operators.html
title: Python 条件语句, href=http://www.runoob.com//python/python-if-statement.html
title: Python 循环语句, href=http://www.runoob.com//python/python-loops.html
title: Python While 循环语句, href=http://www.runoob.com//python/python-while-loop.html
title: Python for 循环语句, href=http://www.runoob.com//python/python-for-loop.html
title: Python 循环嵌套, href=http://www.runoob.com//python/python-nested-loops.html
title: Python break 语句, href=http://www.runoob.com//python/python-break-statement.html
title: Python continue  语句, href=http://www.runoob.com//python/python-continue-statement.html
title: Python pass 语句, href=http://www.runoob.com//python/python-pass-statement.html
title: Python Number(数字), href=http://www.runoob.com//python/python-numbers.html
title: Python 字符串, href=http://www.runoob.com//python/python-strings.html
title: Python 列表(List), href=http://www.runoob.com//python/python-lists.html
title: Python 元组, href=http://www.runoob.com//python/python-tuples.html
title: Python 字典(Dictionary), href=http://www.runoob.com//python/python-dictionary.html
title: Python 日期和时间, href=http://www.runoob.com//python/python-date-time.html
title: Python 函数, href=http://www.runoob.com//python/python-functions.html
title: Python 模块, href=http://www.runoob.com//python/python-modules.html
title: Python 文件I/O, href=http://www.runoob.com//python/python-files-io.html
title: Python File 方法, href=http://www.runoob.com/file-methods.html
title: Python 异常处理, href=http://www.runoob.com//python/python-exceptions.html
title: Python OS 文件/目录方法, href=http://www.runoob.com/os-file-methods.html
title: Python 内置函数, href=http://www.runoob.com/python-built-in-functions.html
title: Python 面向对象, href=http://www.runoob.com//python/python-object.html
title: Python正则表达式, href=http://www.runoob.com//python/python-reg-expressions.html
title: Python CGI编程, href=http://www.runoob.com//python/python-cgi.html
title: python 操作MySQL数据库, href=http://www.runoob.com//python/python-mysql.html
title: Python 网络编程, href=http://www.runoob.com/python-socket.html
title: Python SMTP发送邮件, href=http://www.runoob.com//python/python-email.html
title: Python 多线程, href=http://www.runoob.com//python/python-multithreading.html
title: Python XML解析, href=http://www.runoob.com//python/python-xml.html
title: Python GUI 编程(Tkinter), href=http://www.runoob.com//python/python-gui-tkinter.html
title: Python2.x与3​​.x版本区别, href=http://www.runoob.com//python/python-2x-3x.html
title: Python IDE, href=http://www.runoob.com//python/python-ide.html
title: Python JSON, href=http://www.runoob.com//python/python-json.html
title: Python 100例, href=http://www.runoob.com//python/python-100-examples.html


这里这是举一个简单的例子,想玩更多丰富的html分析,打开你的脑洞想象吧~!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: