python scrapy 网络采集使用代理的方法
2016-05-01 00:22
856 查看
http://www.sharejs.com/codes/python/8309
1.在Scrapy工程下新建“middlewares.py”
2.在项目配置文件里(./project_name/settings.py)添加
只要两步,现在请求就是通过代理的了。测试一下^_^
1.在Scrapy工程下新建“middlewares.py”
# Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication |
import base64 |
# Start your middleware class |
class ProxyMiddleware( object ): |
# overwrite process request |
def process_request( self , request, spider): |
# Set the location of the proxy |
request.meta[ 'proxy' ] = "http://YOUR_PROXY_IP:PORT" |
# Use the following lines if your proxy requires authentication |
proxy_user_pass = "USERNAME:PASSWORD" |
# setup basic authentication for the proxy |
encoded_user_pass = base64.encodestring(proxy_user_pass) |
request.headers[ 'Proxy-Authorization' ] = 'Basic ' + encoded_user_pass |
#该代码片段来自于: http://www.sharejs.com/codes/python/8309 |
DOWNLOADER_MIDDLEWARES = { |
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware' : 110 , |
'project_name.middlewares.ProxyMiddleware' : 100 , |
} |
from scrapy.spider import BaseSpider |
from scrapy.contrib.spiders import CrawlSpider, Rule |
from scrapy.http import Request |
class TestSpider(CrawlSpider): |
name = "test" |
domain_name = "whatismyip.com" |
# The following url is subject to change, you can get the last updated one from here : |
# http://www.whatismyip.com/faq/automation.asp |
start_urls = [ "http://xujian.info" ] |
def parse( self , response): |
open ( 'test.html' , 'wb' ).write(response.body) |
#该代码片段来自于: http://www.sharejs.com/codes/python/8309 |
相关文章推荐
- 网络编程学习——数据链路访问
- 网络编程学习——客户/服务器程序设计范式(一)
- 网络编程学习——一些辅助函数
- JAVA_TCP_HTTP_Get_Data_Baidu
- TCP协议详解
- 70行Java代码BP神经网络
- 用CInternetSession实现HTTP POST登录
- 设计新的Android HTTP请求封装类
- Centos6.5的几种网络环境配置方法
- 校园网无法拨号的一些解决方案
- HTTP状态码
- HTTP状态码
- 物联网|无线传感器网络|IEEE 802.15.4|ZigBee|CC2530|Z-Stack
- Unix网络编程学习笔记(一)初步认识socket编程
- QTcpServer中的incomingConnection函数不执行
- 网络基本功:TCP拥塞控制机制
- 【BZOJ1834】【codevs1362】网络扩容,最大流+费用流
- 网络编程 -java
- libevent 构造httpServer
- OSI七层模型与TCP/IP模型