用python写爬虫,爬取清纯妹子网站
2017-07-14 18:50
267 查看
转载:https://zhuanlan.zhihu.com/p/26395979
# encoding: utf-8 import requests from lxml import html def get_page_number(num): url = "http://www.mmjpg.com/home/" + num; response = requests.get(url).content; selector = html.fromstring(response); urls = []; for i in selector.xpath("//ul/li/a/@href"): urls.append(i) return urls def get_image_title(url): response = requests.get(url).content selector = html.fromstring(response) image_title = selector.xpath("//h2/text()")[0] return image_title def get_image_amount(url): response = requests.get(url).content selector = html.fromstring(response) image_amount = selector.xpath("//div[@class='page']/a[last()-1]/text()")[0] return image_amount def get_image_detail_website(url): response = requests.get(url).content selector = html.fromstring(response) image_detail_websites = [] image_amount = selector.xpath("//div[@class='page']/a[last()-1]/text()")[0] for i in range(int(image_amount)): image_detail_link = '{}/{}'.format(url, i + 1) response = requests.get(image_detail_link).content sel = html.fromstring(response) image_download_link = sel.xpath("//div[@class='content']/a/img/@src")[0] image_detail_websites.append(image_download_link) return image_detail_websites def download_image(image_title, image_detail_websites): num = 1; amount = len(image_detail_websites) for i in image_detail_websites: filename = '%s%s.jpg' % (image_title, num) # print('正在下载图片:%s第%s/%s张,' % (image_title, num, amount)) print(image_title, num, amount) with open(filename, 'wb') as f: f.write(requests.get(i).content) num += 1 # if __name__ == '__main__': # page_number = input('请输入需要爬取的页码:') for link in get_page_number("2"): print link download_image(get_image_title(link), get_image_detail_website(link)) # urlss = get_page_number("1"); # print urlss;
相关文章推荐
- python——图片爬虫:爬取爱女神网站(www.znzhi.net)上的妹子图 基础篇
- python——图片爬虫:爬取爱女神网站(www.znzhi.net)上的妹子图 进阶篇
- Python爬虫小实践:下载妹子图www.mzitu.com网站上所有的妹子图片,并按相册名字建立文件夹分好文件名
- 某徒步旅游网站python爬虫小练习
- python 爬虫 实战 抓取妹子图中图
- 网络爬虫之网站图片爬取-python实现
- python shell 爬虫 妹子图片
- Python爬虫模拟登录带验证码网站
- Python网络数据采集13:用爬虫测试网站
- python爬虫下载网站所有文件
- python3 [爬虫入门实战] 查看网站有多少个网页(站点)
- 【实战\聚焦Python分布式爬虫必学框架Scrapy 打造搜索引擎项目笔记】第4章 scrapy爬取知名技术文章网站(2)
- python爬虫模拟登录网站(一)-----豆瓣
- python 爬虫 网络小说下载(静态网站)
- Python 网络爬虫 002 (入门) 爬取一个网站之前,要了解的知识
- Python 2.7和3.6爬取妹子图网站单页测试图片
- 【Python爬虫】了解网站信息
- python解决网站的反爬虫策略总结
- Python3爬虫爬取某网站美女图片