Python.Scrapy.14-scrapy-source-code-analysis-part-4
2015-07-15 20:48
736 查看
Scrapy 源代码分析系列-4 scrapy.commands 子包
子包scrapy.commands定义了在命令scrapy中使用的子命令(subcommand): bench, check, crawl, deploy, edit, fetch,genspider, list, parse, runspider, settings, shell, startproject, version, view。 所有的子命令模块都定义了一个继承自
类ScrapyCommand的子类Command。
首先来看一下子命令crawl, 该子命令用来启动spider。
1. crawl.py
关注的重点在方法run(self, args, opts):def run(self, args, opts): if len(args) < 1: raise UsageError() elif len(args) > 1: raise UsageError("running 'scrapy crawl' with more than one spider is no longer supported") spname = args[0] crawler = self.crawler_process.create_crawler() # A spider = crawler.spiders.create(spname, **opts.spargs) # B crawler.crawl(spider) # C self.crawler_process.start() # D
那么问题来啦,run接口方法是从哪里调用的呢? 让我们回到 Python.Scrapy.11-scrapy-source-code-analysis-part-1
中 "1.2 cmdline.py command.py" 关于"_run_print_help() "的说明。
A: 创建类Crawler对象crawler。在创建Crawler对象时, 同时将创建Crawler对象的实例属性spiders(SpiderManager)。如下所示:
class Crawler(object): def __init__(self, settings): self.configured = False self.settings = settings self.signals = SignalManager(self) self.stats = load_object(settings['STATS_CLASS'])(self) self._start_requests = lambda: () self._spider = None # TODO: move SpiderManager to CrawlerProcess spman_cls = load_object(self.settings['SPIDER_MANAGER_CLASS']) self.spiders = spman_cls.from_crawler(self) # spiders 的类型是: SpiderManager
Crawler对象对应一个SpiderManager对象,而SpiderManager对象管理多个Spider。
B: 获取Sipder对象。
C: 为Spider对象安装Crawler对象。(为蜘蛛安装爬行器)
D: 类CrawlerProcess的start()方法如下:
def start(self): if self.start_crawling(): self.start_reactor() def start_crawling(self): log.scrapy_info(self.settings) return self._start_crawler() is not None def start_reactor(self): if self.settings.getbool('DNSCACHE_ENABLED'): reactor.installResolver(CachingThreadedResolver(reactor)) reactor.addSystemEventTrigger('before', 'shutdown', self.stop) reactor.run(installSignalHandlers=False) # blocking call def _start_crawler(self): if not self.crawlers or self.stopping: return name, crawler = self.crawlers.popitem() self._active_crawler = crawler sflo = log.start_from_crawler(crawler) crawler.configure() crawler.install() crawler.signals.connect(crawler.uninstall, signals.engine_stopped) if sflo: crawler.signals.connect(sflo.stop, signals.engine_stopped) crawler.signals.connect(self._check_done, signals.engine_stopped) crawler.start() # 调用类Crawler的start()方法 return name, crawler
类Crawler的start()方法如下:
def start(self): yield defer.maybeDeferred(self.configure) if self._spider: yield self.engine.open_spider(self._spider, self._start_requests()) # 和Engine建立了联系 (ExecutionEngine) yield defer.maybeDeferred(self.engine.start)
关于类ExecutionEngine将在子包scrapy.core分析涉及。
2. startproject.py
3. subcommand是如何加载的
在cmdline.py的方法execute()中有如下几行代码:inproject = inside_project() cmds = _get_commands_dict(settings, inproject) cmdname = _pop_command_name(argv)
_get_commands_dict():
def _get_commands_dict(settings, inproject): cmds = _get_commands_from_module('scrapy.commands', inproject) cmds.update(_get_commands_from_entry_points(inproject)) cmds_module = settings['COMMANDS_MODULE'] if cmds_module: cmds.update(_get_commands_from_module(cmds_module, inproject)) return cmds
_get_commands_from_module():
def _get_commands_from_module(module, inproject): d = {} for cmd in _iter_command_classes(module): if inproject or not cmd.requires_project: cmdname = cmd.__module__.split('.')[-1] d[cmdname] = cmd() return d
To Be Continued
接下来解析settings相关的逻辑。Python.Scrapy.15-scrapy-source-code-analysis-part-5
相关文章推荐
- python3.4学习笔记(二十三) Python调用淘宝IP库获取IP归属地返回省市运营商实例代码
- python实现爬虫下载美女图片
- 【Python】容器类
- 【Python 学习手册笔记】列表与字典
- A Neural Network in 11 lines of Python
- VS2012 + SWIG Python
- python字符串/元组/列表/字典互转
- Python.Scrapy.12-scrapy-source-code-analysis-part-2
- Python实现快速多线程ping的方法
- python_metaclass浅析
- Python新式类和经典类的区别
- 浅谈Python Web的五大框架
- 修饰器
- Python Windows下文件读写与二进制读写的区别
- python中的构造函数
- Python.Scrapy.11-scrapy-source-code-analysis-part-1
- Python-文件修改器
- 【Python】 Flask 缘分匹配练手项目
- Python实现将目录中TXT合并成一个大TXT文件的方法
- 用 Python 管理 Android 中 strings.xml 的字符翻译