Scrapy 爬取盗墓笔记小说
2019-05-26 15:11
295 查看
Scrapy 爬取盗墓笔记小说
应用 Scrapy框架 爬取盗墓笔记小说数据,存入MongoDB 数据库。
# settings 配置mongodb MONGODB_HOST = '127.0.0.1' MONGODB_PORT = 27017 MONGODB_DBNAME = 'MySpider' MONGODB_DOCNAME = 'daomubiji'
# items 配置抓取数据字段 import scrapy class NovelItem(scrapy.Item): bookName = scrapy.Field() bookTitle = scrapy.Field() chapterNum = scrapy.Field() chapterName = scrapy.Field() chapterUrl = scrapy.Field()
# spider 抓取数据 import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule from novel.items import NovelItem class DaomubijiSpider(CrawlSpider): name = 'daomubiji' allowed_domains = ['daomubiji.com'] start_urls = ['http://www.daomubiji.com/'] def parse_start_url(self, response): pass rules = ( Rule(LinkExtractor(restrict_xpaths='//article[@class="article-content"]//a'), callback='parse_item', follow=True), ) def parse_item(self, response): item = NovelItem() list = response.xpath('//body') for listItem in list: item['bookName'] = listItem.xpath('.//h1[@class="focusbox-title"]/text()').get().split(':')[0] subList = listItem.xpath('.//div[@class="excerpts"]//article') for subListItem in subList: item['bookTitle'] = subListItem.xpath('.//a/text()').get().split(' ')[0] item['chapterNum'] = subListItem.xpath('.//a/text()').get().split(' ')[1] item['chapterName'] = subListItem.xpath('.//a/text()').get().split(' ')[2] item['chapterUrl'] = subListItem.xpath('.//a/@href').get() yield item
# pipeline 处理数据 from scrapy.conf import settings import pymongo class NovelPipeline(object): def __init__(self): host = settings['MONGODB_HOST'] port = settings['MONGODB_PORT'] dbName = settings['MONGODB_DBNAME'] client = pymongo.MongoClient(host=host, port=port) db = client[dbName] self.post = db[settings['MONGODB_DOCNAME']] def open_spider(self, spider): print('This spider is starting!') def process_item(self, item, spider): bookInfo = dict(item) self.post.insert(bookInfo) return item def close_spider(self, spider): print('This spider is end!')
相关文章推荐
- 使用Scrapy爬取顶点小说整个网站的小说,入库Mysql!
- Python的scrapy之爬取顶点小说网的所有小说
- python2.7 爬虫_爬取小说盗墓笔记章节及URL并导入MySQL数据库_20161201
- python 网站爬虫 下载在线盗墓笔记小说到本地的脚本
- 爬虫第五战 scrapy小说爬取
- 使用scrapy 抓取顶点小说
- scrapy学习及爬起点小说
- Scrapy爬取盗墓笔记 0.2版(mongedb redis)
- scrapy实例 ----- 爬取小说
- Python使用Scrapy爬虫框架爬取天涯社区小说“大宗师”全文
- python 站点爬虫 下载在线盗墓笔记小说到本地的脚本
- Python爬虫实战之使用Scrapy爬起点网的完本小说
- Python scrapy_splash爬取所有腾讯小说
- Scrapy 爬取全职高手小说
- 使用scrapy 0.24 制作的小说爬虫
- scrapy实现爬取全书网小说到Mysql数据库(附代码)
- 用正则表达式爬取小说盗墓笔记
- 爬虫 scrapy 抓取小说实例
- 使用django+mysql+scrapy制作的一个小说网站
- 使用scrapy制作的小说爬虫