Scrapy:Python的爬虫框架
2015-06-29 00:29
671 查看
使用Scrapy可以很方便的完成网上数据的采集工作,它为我们完成了大量的工作,而不需要自己费大力气去开发。
Items are containers that will be loaded with the scraped data;
Spiders are classes that you define and Scrapy uses to scrape information from a domain
They define an initial list of URLs to download
Scrapy Engine
The engine is responsible for controlling the data flow between all components of the system, and triggering events when certain actions occur.
用来处理整个系统的数据流处理,触发事务。
Scheduler
The Scheduler receives requests from the engine and enqueues them for feeding them later when the engine requests them.
用来接受引擎发过来的请求,压入队列中,并在引擎再次请求的时候返回。
Downloader
The Downloader is responsible for fetching web pages and feeding them to the engine which, in turn, feeds them to the spiders.
用于下载网页内容,并将网页内容返回给蜘蛛。
Spiders
Spiders are custom classes written by Scrapy users to parse responses and extract items from them or additional URLs to follow. Each spider is able to handle a specific domain .
用它来制订特定域名或网页的解析规则。
Item Pipeline
The Item Pipeline is responsible for processing the items once they have been extracted by the spiders. Typical tasks include cleansing, validation and persistence.
负责处理有蜘蛛从网页中抽取的项目,他的主要任务是清晰、验证和存储数据。
Downloader middlewares
Downloader middlewares are specific hooks that sit between the Engine and the Downloader and process requests when they pass from the Engine to the Downloader, and responses that pass from Downloader to the Engine. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code.
Scrapy引擎和下载器之间的钩子框架,主要是处理Scrapy引擎与下载器之间的请求及响应。
Spider middlewares
Spider middlewares are specific hooks that sit between the Engine and the Spiders and are able to process spider input and output. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code.
Scrapy引擎和蜘蛛之间的钩子框架,主要工作是处理蜘蛛的响应输入和请求输出。
Items are containers that will be loaded with the scraped data;
Spiders are classes that you define and Scrapy uses to scrape information from a domain
They define an initial list of URLs to download
Scrapy Engine
The engine is responsible for controlling the data flow between all components of the system, and triggering events when certain actions occur.
用来处理整个系统的数据流处理,触发事务。
Scheduler
The Scheduler receives requests from the engine and enqueues them for feeding them later when the engine requests them.
用来接受引擎发过来的请求,压入队列中,并在引擎再次请求的时候返回。
Downloader
The Downloader is responsible for fetching web pages and feeding them to the engine which, in turn, feeds them to the spiders.
用于下载网页内容,并将网页内容返回给蜘蛛。
Spiders
Spiders are custom classes written by Scrapy users to parse responses and extract items from them or additional URLs to follow. Each spider is able to handle a specific domain .
用它来制订特定域名或网页的解析规则。
Item Pipeline
The Item Pipeline is responsible for processing the items once they have been extracted by the spiders. Typical tasks include cleansing, validation and persistence.
负责处理有蜘蛛从网页中抽取的项目,他的主要任务是清晰、验证和存储数据。
Downloader middlewares
Downloader middlewares are specific hooks that sit between the Engine and the Downloader and process requests when they pass from the Engine to the Downloader, and responses that pass from Downloader to the Engine. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code.
Scrapy引擎和下载器之间的钩子框架,主要是处理Scrapy引擎与下载器之间的请求及响应。
Spider middlewares
Spider middlewares are specific hooks that sit between the Engine and the Spiders and are able to process spider input and output. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code.
Scrapy引擎和蜘蛛之间的钩子框架,主要工作是处理蜘蛛的响应输入和请求输出。
相关文章推荐
- hadoop 流streaming跑python程序
- Python访问纯真IP数据库脚本分享
- 编写Python CGI脚本的教程
- Python数据类型之字典
- Python数据类型之元组
- PAT 最大子列和问题 (Python)
- Python数据类型之列表
- Python字符串处理
- python3.x与python2.x不同点
- python __init__.py
- python遇到的问题解决方法的记录
- python设计经典Pong 游戏
- Python——特殊属性与方法
- Python并发处理
- python刷取CSDN博文访问量之四
- python刷取CSDN博文访问量之三
- python刷取CSDN博文访问量之二
- python刷取CSDN博文访问量之一
- Python源码剖析笔记3-Python执行原理初探
- Python requests 自动登录某财BBS,自动签到打卡领铜钱,最后再配个plist,每天自动执行