pyspider爬虫学习-文档翻译-About-Tasks.md
2017-08-30 00:00
363 查看
摘要: 首次尝试读取源码的同时翻译文档,大家多多指正,勿喷
About Tasks 关于任务 =========== 任务是指被调度的基本单元 Tasks are the basic unit to be scheduled. Basis ----- 每个任务都有不同的“taskid”。(默认为:“md5(url)”,可以通过重写“def get_taskid(self, task)” 方法经行修改) * A task is differentiated by its `taskid`. (Default: `md5(url)`, can be changed by overriding the `def get_taskid(self, task)` method) 在不同的项目之间任务是相互隔离的 * Tasks are isolated between different projects. 每个任务有种状态 * A Task has 4 status: - active 运行 - failed 失败 - success 成功 - bad 错误 - not used 非用户产生 仅仅当任务为运行状态时才会被调度 * Only tasks in active status will be scheduled. 任务按优先级顺序执行 * Tasks are served in order of `priority`. Schedule -------- #### new task 当一个新任务进来的时候 When a new task (never seen before) comes in: 如果执行时间已经设置但是没有起作用,它将被放在基于时间的队列中等待。 * If `exetime` is set but not arrived, it will be put into a time-based queue to wait. 否则将被接受 * Otherwise it will be accepted. 当这个任务已经在队列里面时 When the task is already in the queue: 除非强制更新否则忽略 * Ignored unless `force_update` 当一个任务完成退出 When a completed task comes out: 如果"age"已经设置,且`last_crawl_time + age < now`任务将被接受,否则丢弃 * If `age` is set, `last_crawl_time + age < now` it will be accepted. Otherwise discarded. 如果"itag"已经设置,且不等于它之前得值,任务将被接受,否则丢弃 * If `itag` is set and not equal to it's previous value, it will be accepted. Otherwise discarded. #### task retry 任务重试 当发生读取错误或脚本错误时,任务将在默认情况下重试3次。 When a fetch error or script error happens, the task will retry 3 times by default. 首次重试将在30秒,1小时,6小时,12小时分别执行一次,更多的重试将在等待24小时后执行 The first retry will execute every time after 30 seconds, 1 hour, 6 hours, 12 hours and any more retries will postpone 24 hours. 如果“age”已经指定,那么重试延时将不会大于“age” If `age` is specified, the retry delay will not larger then `age`. 你可以通过添加名为“retry_delay”的变量处理者的方式来配置重试延时,“retry_delay”是一个字典用来明确重试间隔,字典项格式为{retried: seconds},如果没有指定,就用一个特殊的key:''(空字符串)指定默认重试, You can config the retry delay by adding a variable named `retry_delay` to handlerretry_delay. `retry_delay` is a dict to specify retry intervals. The items in the dict are {retried: seconds}, and a special key: '' (empty string) is used to specify the default retry delay if not specified. 这个默认的”retry_delay“ 声明如下 e.g. the default `retry_delay` declares like: ``` class MyHandler(BaseHandler): retry_delay = { 0: 30, 1: 1*60*60, 2: 6*60*60, 3: 12*60*60, '': 24*60*60 } ```
相关文章推荐
- pyspider爬虫学习-文档翻译-About-Projects.md
- pyspider爬虫学习-文档翻译-Deployment.md
- pyspider爬虫学习-文档翻译-index.md
- pyspider爬虫学习-文档翻译-Working-with-Results.md
- pyspider爬虫学习-文档翻译-Architecture.md
- pyspider爬虫学习-文档翻译-Command-Line.md
- pyspider爬虫学习-文档翻译-Script-Environment.md
- pyspider爬虫学习-文档翻译-Frequently-Asked-Questions.md
- pyspider爬虫学习-教程2-AJAX-and-more-HTTP.md
- pyspider爬虫学习-教程1-HTML-and-CSS-Selector.md
- pyspider爬虫学习-API-self.send_message.md
- pyspider爬虫学习-API-Response.md
- iOS学习笔记(6)——翻译苹果文档About Windows and Views
- pyspider爬虫学习-API-self.crawl.md
- pyspider爬虫学习-教程3-Render-with-PhantomJS.md
- Apache Drill学习文档尝试翻译之Parquet格式数据转换
- Android学习之Soft Keyboard使用文档翻译
- 爬虫学习 pyspider和scrapy小结 / 与其他工具对比
- (翻译)UIP Application Block学习系列(二)谁将学习这个文档