用scrapy获取代理ip地址
2017-05-28 21:05
344 查看
items.py
-*- coding: utf-8 -*- # Define here the models for your scraped items # # See documentation in: # http://doc.scrapy.org/en/latest/topics/items.html import scrapy class GetproxyItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() ip = scrapy.Field() port = scrapy.Field() type = scrapy.Field() location = scrapy.Field() protocol = scrapy.Field() source = scrapy.Field()
pipelines.py
# -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html class GetproxyPipeline(object): def process_item(self, item, spider): fileName = 'proxy.txt' with open(fileName,'a') as fp: fp.write(item['ip'].encode('utf8').strip() + '\t') fp.write(item['port'].encode('utf8').strip() + '\t') fp.write(item['protocol'].encode('utf8').strip() + '\t') fp.write(item['type'].encode('utf8').strip() + '\t') fp.write(item['location'].encode('utf8').strip() + '\t') fp.write(item['source'].encode('utf8').strip() + '\n') return item
proxy360pider.py
# -*- coding: utf-8 -*- import scrapy from getProxy.items import GetproxyItem class Proxy360piderSpider(scrapy.Spider): name = "proxy360pider" allowed_domains = ["proxy360.cn"] start_urls = [] nations = ['Brazil','China','Taiwan','Japan','Thailand','Vietnam','bahrenin'] for nation in nations: start_urls.append('http://www.proxy360.cn/Region/' + nation) def parse(self, response): subSelector = response.xpath('//div[@class="proxylistitem" and @name="list_proxy_ip"]') items = [] for sub in subSelector: item = GetproxyItem() item['ip'] = sub.xpath('.//span[1]/text()').extract()[0] item['port'] = sub.xpath('.//span[2]/text()').extract()[0] item['type'] = sub.xpath('.//span[3]/text()').extract()[0] item['location'] = sub.xpath('.//span[4]/text()').extract()[0] item['protocol'] ='http' item['source'] = 'proxy360' items.append(item) return items
部分代理ip 210.246.192.149 80 http 高匿 泰国 proxy360 118.175.255.10 80 http 高匿 泰国 proxy360 203.158.167.152 8080 http 高匿 泰国 proxy360 58.147.80.194 3128 http 高匿 泰国 proxy360 122.155.0.244 3128 http 透明 泰国 proxy360 203.151.233.143 80 http 高匿 泰国 proxy360
相关文章推荐
- 多级代理下Nginx获取真实用户IP地址的总结
- 多级代理下Nginx获取真实用户IP地址的总结
- 使用Nginx代理通过HttpServletRequest获取用户IP地址
- 多级代理下Nginx获取真实用户IP地址的总结
- java获取客户端以及代理IP地址
- c#获取真实IP和代理IP
- 获取指定IP的终端的MAC地址
- C#获取本地计算机名IP,Mac地址
- 在vb中使用Iphlpapi.dll获取网络信息 第二章 第十节 获取本机已探测的IP – 物理地址映射表
- c#获取真实IP和代理IP
- 获取指定IP的终端的MAC地址
- C#编程之 如何获取本地和远程主机的IP及MAC地址
- ASP.NET获取IP与MAC地址的方法
- C#中获取本地计算机的的计算机名,IP和MAC地址
- 国外代理IP地址,随时跟新的,还不错 http://www.ifstar.net/proxy/index.php?act=list&page=1
- jsp如何获取用户的真实IP地址...
- C#获取本地计算机名,IP,MAC(物理)地址
- Linux 下获取LAN中指定IP的网卡的MAC(物理地址)
- sqlserver根据IP获取地址的自定义函数
- C#获取本地计算机名IP,Mac地址