您的位置:首页 > 编程语言 > Python开发

Python 爬虫简单实战之CSDN

2017-07-16 14:57 603 查看
此文仅是分享 <(o゜▽゜)o☆[BINGO!]

代码实现很简单,即用python爬虫不断请求文章页面即可.

主要用到requests库即可

别太过分了:-O

示例代码:

# -*- coding: utf-8 -*-
# @Author   : Sdite
# @DateTime : 2017-07-16 14:17:22

import requests
from bs4 import BeautifulSoup
import re
import time

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',
}

# 准备阶段,获取博客内文章的链接,存放在变量url中
url = "http://blog.csdn.net/vonsdite"
res = requests.get(url=url, headers=headers)
part = re.compile(r'<span class="link_title"><a href="(/vonsdite/article/details/.+?)"')
url = part.findall(res.text)
url = ['http://blog.csdn.net/' + tmp for tmp in url]

# 刷阅读量阶段
while True:
for u in url:
res = requests.get(url=u, headers=headers)
text = res.text
soup = BeautifulSoup(text, 'lxml')
rank = soup.select('#blog_rank')
part = re.compile(r'<li>(访问:)<span>(\d+次)</span></li>')
rank = part.findall(str(rank[0]))
rank = rank[0][0] + rank[0][1]
print('博客: ' + rank)
time.sleep(2)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python 爬虫 csdn