Python 超级简单的网站html分析框架BeautifulSoup
2018-02-03 09:53
926 查看
Python 超级简单的网站爬取数据框架BeautifulSoup
案例
比如:我们要爬取这个 网站 的左侧栏所有的列表数据,如图所示我们分析下这个网页的左侧栏的html结构,如图
结果发现就在id为leftcolumn下搜有的a标签,那么python代码该如何写了?
# coding: utf-8 import urllib2 from bs4 import BeautifulSoup url_request = urllib2.urlopen('http://www.runoob.com/python/python-tutorial.html') html_doc = url_request.read().decode('utf-8', 'ignore') soup = BeautifulSoup(html_doc, 'html.parser') # print(soup.prettify()) anchor_list = soup.find(id='leftcolumn').find_all('a') for anchor in anchor_list: astring = "title: " + anchor.get('title') + ", href=http://www.runoob.com/" + anchor.get('href') print(astring)
输出的结果是:
title: Python 基础教程, href=http://www.runoob.com//python/python-tutorial.html title: Python 简介, href=http://www.runoob.com//python/python-intro.html title: Python 环境搭建, href=http://www.runoob.com//python/python-install.html title: Python 中文编码, href=http://www.runoob.com/python-chinese-encoding.html title: Python 基础语法, href=http://www.runoob.com//python/python-basic-syntax.html title: Python 变量类型, href=http://www.runoob.com//python/python-variable-types.html title: Python 运算符, href=http://www.runoob.com//python/python-operators.html title: Python 条件语句, href=http://www.runoob.com//python/python-if-statement.html title: Python 循环语句, href=http://www.runoob.com//python/python-loops.html title: Python While 循环语句, href=http://www.runoob.com//python/python-while-loop.html title: Python for 循环语句, href=http://www.runoob.com//python/python-for-loop.html title: Python 循环嵌套, href=http://www.runoob.com//python/python-nested-loops.html title: Python break 语句, href=http://www.runoob.com//python/python-break-statement.html title: Python continue 语句, href=http://www.runoob.com//python/python-continue-statement.html title: Python pass 语句, href=http://www.runoob.com//python/python-pass-statement.html title: Python Number(数字), href=http://www.runoob.com//python/python-numbers.html title: Python 字符串, href=http://www.runoob.com//python/python-strings.html title: Python 列表(List), href=http://www.runoob.com//python/python-lists.html title: Python 元组, href=http://www.runoob.com//python/python-tuples.html title: Python 字典(Dictionary), href=http://www.runoob.com//python/python-dictionary.html title: Python 日期和时间, href=http://www.runoob.com//python/python-date-time.html title: Python 函数, href=http://www.runoob.com//python/python-functions.html title: Python 模块, href=http://www.runoob.com//python/python-modules.html title: Python 文件I/O, href=http://www.runoob.com//python/python-files-io.html title: Python File 方法, href=http://www.runoob.com/file-methods.html title: Python 异常处理, href=http://www.runoob.com//python/python-exceptions.html title: Python OS 文件/目录方法, href=http://www.runoob.com/os-file-methods.html title: Python 内置函数, href=http://www.runoob.com/python-built-in-functions.html title: Python 面向对象, href=http://www.runoob.com//python/python-object.html title: Python正则表达式, href=http://www.runoob.com//python/python-reg-expressions.html title: Python CGI编程, href=http://www.runoob.com//python/python-cgi.html title: python 操作MySQL数据库, href=http://www.runoob.com//python/python-mysql.html title: Python 网络编程, href=http://www.runoob.com/python-socket.html title: Python SMTP发送邮件, href=http://www.runoob.com//python/python-email.html title: Python 多线程, href=http://www.runoob.com//python/python-multithreading.html title: Python XML解析, href=http://www.runoob.com//python/python-xml.html title: Python GUI 编程(Tkinter), href=http://www.runoob.com//python/python-gui-tkinter.html title: Python2.x与3.x版本区别, href=http://www.runoob.com//python/python-2x-3x.html title: Python IDE, href=http://www.runoob.com//python/python-ide.html title: Python JSON, href=http://www.runoob.com//python/python-json.html title: Python 100例, href=http://www.runoob.com//python/python-100-examples.html
这里这是举一个简单的例子,想玩更多丰富的html分析,打开你的脑洞想象吧~!
相关文章推荐
- 学习]用python的BeautifulSoup分析html
- [学习]用python的BeautifulSoup分析html
- [学习]用python的BeautifulSoup分析html
- 用python的BeautifulSoup分析html 【转】
- Python:[转]浅学BeautifulSoup分析html
- 用python的BeautifulSoup分析html
- 用python的BeautifulSoup分析html
- 用python的BeautifulSoup分析html
- 用python的BeautifulSoup分析html
- [Python下载CSDN博客]2. 使用BeautifulSoup分析HTML(一)
- [Python下载CSDN博客]2. 使用BeautifulSoup分析HTML(二)
- flask框架实战—简单图片社交网站(一):Python语言快速入门
- 用python的BeautifulSoup分析html
- 一个超级简单的HTML模板框架源代码以及使用示例
- Python爬虫之使用BeautifulSoup解析HTML文本
- Python爱好者之超级简单Hello World!
- Python---BeautifulSoup 简单的爬虫实例
- 使用LogParser分析网站运行情况(比较简单)
- Arachnid包含一个简单的HTML剖析器能够分析包含HTML内容的输入流
- 用Python超级简单仿163嵌套评论