scrapy 爬虫代理ip,及免费ip分享
2019-02-27 12:00
134 查看
settings文件
DOWNLOADER_MIDDLEWARES = { 'Suning.middlewares.SuningDownloaderMiddleware': 543 } USER_AGENTS = [ "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)", "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)", "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)", "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0", "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5" ] #代理IP池 这里可以多些 PROXIES = [ {'ip_port': '112.95.27.113:8088'},]
middlewares文件
整个思路就是从settings写的 代理池 以及 请求头池里边取出一个来做为IP访问
from .settings import USER_AGENTS from .settings import PROXIES import request import random class SuningDownloaderMiddleware(object): # Not all methods need to be defined. If a method is not defined, # scrapy acts as if the downloader middleware does not modify the # passed objects. def process_request(self, request, spider): useragent = random.choice(USER_AGENTS) proxy = random.choice(PROXIES) request.meta['proxy'] = "http://" + proxy['ip_port'] request.headers.setdefault("User-Agent", useragent)
顺便给大家一个免费IP网站
http://www.89ip.cn/index_7.html?tdsourcetag=s_pctim_aiomsg
温馨提示:花钱的一定比免费的好。
相关文章推荐
- Python实现爬虫设置代理IP和伪装成浏览器的方法分享
- 【python爬虫】在scrapy中利用代理IP(爬取BOSS直聘网)
- scrapy爬虫代理——利用crawlera神器,无需再寻找代理IP
- Scrapy爬虫:代理IP配置
- 第三百四十八节,Python分布式爬虫打造搜索引擎Scrapy精讲—通过自定义中间件全局随机更换代理IP
- 爬虫实战----从免费IP代理网站获取连接率较好的可用IP
- C#多线程爬虫抓取免费代理IP
- 使用python为爬虫获取免费代理ip
- C#多线程爬虫抓取免费代理IP的示例代码
- Python2.*爬虫获取免费有效IP代理
- C#多线程爬虫抓取免费代理IP
- 第四章 爬取西刺免费代理ip 并应用到scrapy
- scrapy抓取免费代理IP
- Scrapy爬虫:代理IP配置
- C#多线程爬虫抓取免费代理IP
- 无忧代理免费ip爬取(端口js加密)
- 蚂蚁代理免费代理ip爬取(端口图片显示+token检查)
- 实践出真知-scrapy集成ip代理(以阿布云为例)
- 国外免费代理 IP 公布 免费资源搜搜搜 免费资源搜搜搜 缘思
- Python 爬虫入门—— IP代理使用