好书推荐:Python网络数据采集
2017-04-19 20:28
477 查看
小编最近在学习Python网络爬虫爬取数据,发现一本挺不错的教材《Python网络数据采集》,推荐给大家,有需要Python学习资料的可以来这个群,首先是四七二,中间是三零九,最后是二六一,里面有大量的学习资料可以下载。当然也会写一些小的爬虫程序,欢迎留言交流。
案例分享:为了找一份Python实习,我用爬虫收集数据
import requests,xlwt,os
from bs4 import BeautifulSoup
from lxml import etree
from fake_useragent import UserAgent
ua = UserAgent()
headers = {'User-Agent': 'ua.random'}
job = []
location = []
company = []
salary = []
link = []
for k in range(1, 10):
url = 'http://www.shixiseng.com/interns?k=python&p=' + str(k)
r = requests.get(url, headers=headers).text
s = etree.HTML(r)
job1 = s.xpath('//a/h3/text()')
location1 = s.xpath('//span/span/text()')
company1 = s.xpath('//p/a/text()')
salary1 = s.xpath('//span[contains(@class,"money_box")]/text()')
link1 = s.xpath('//div[@class="job_head"]/a/@href')
for i in link1:
url = 'http://www.shixiseng.com' + i
link.append(url)
salary11 = salary1[1::2]
for i in salary11:
salary.append(i.replace('\n\n', ''))
job.extend(job1)
location.extend(location1)
company.extend(company1)
detail = []
for i in link:
r = requests.get(i, headers=headers).text
soup = BeautifulSoup(r, 'lxml')
word = soup.find_all(class_="dec_content")
for i in word:
a = i.get_text()
detail.append(a)
book = xlwt.Workbook()
sheet = book.add_sheet('sheet', cell_overwrite_ok=True)
path = 'D:\\Pycharm\\spider'
os.chdir(path)
j = 0
for i in range(len(job)):
try:
sheet.write(i + 1, j, job[i])
sheet.write(i + 1, j + 1, location[i])
sheet.write(i + 1, j + 2, company[i])
sheet.write(i + 1, j + 3, salary[i])
sheet.write(i + 1, j + 4, link[i])
sheet.write(i + 1, j + 5, detail[i])
except Exception as e:
print('出现异常:' + str(e))
continue
book.save('d:\\python.xls')
案例分享:为了找一份Python实习,我用爬虫收集数据
import requests,xlwt,os
from bs4 import BeautifulSoup
from lxml import etree
from fake_useragent import UserAgent
ua = UserAgent()
headers = {'User-Agent': 'ua.random'}
job = []
location = []
company = []
salary = []
link = []
for k in range(1, 10):
url = 'http://www.shixiseng.com/interns?k=python&p=' + str(k)
r = requests.get(url, headers=headers).text
s = etree.HTML(r)
job1 = s.xpath('//a/h3/text()')
location1 = s.xpath('//span/span/text()')
company1 = s.xpath('//p/a/text()')
salary1 = s.xpath('//span[contains(@class,"money_box")]/text()')
link1 = s.xpath('//div[@class="job_head"]/a/@href')
for i in link1:
url = 'http://www.shixiseng.com' + i
link.append(url)
salary11 = salary1[1::2]
for i in salary11:
salary.append(i.replace('\n\n', ''))
job.extend(job1)
location.extend(location1)
company.extend(company1)
detail = []
for i in link:
r = requests.get(i, headers=headers).text
soup = BeautifulSoup(r, 'lxml')
word = soup.find_all(class_="dec_content")
for i in word:
a = i.get_text()
detail.append(a)
book = xlwt.Workbook()
sheet = book.add_sheet('sheet', cell_overwrite_ok=True)
path = 'D:\\Pycharm\\spider'
os.chdir(path)
j = 0
for i in range(len(job)):
try:
sheet.write(i + 1, j, job[i])
sheet.write(i + 1, j + 1, location[i])
sheet.write(i + 1, j + 2, company[i])
sheet.write(i + 1, j + 3, salary[i])
sheet.write(i + 1, j + 4, link[i])
sheet.write(i + 1, j + 5, detail[i])
except Exception as e:
print('出现异常:' + str(e))
continue
book.save('d:\\python.xls')
相关文章推荐
- Python网络数据采集
- python网络数据采集3(译者:哈雷)
- 20161229:for python网络数据采集03
- Python网络数据采集2-wikipedia
- 笔记之Python网络数据采集
- 笔记之Python网络数据采集
- Python网络数据采集11(译者:哈雷)
- python网络数据采集学习笔记:第二章
- Python 网络数据采集——较好的资源
- python网络数据采集的代码
- Python网络数据采集1(译者:哈雷)
- python 网络数据采集(1-5章)
- Python网络数据采集5(译者:哈雷)
- O'Reilly精品图书推荐:Python网络数据采集
- python 网络数据采集——媒体文件
- python网络数据采集2(译者:哈雷)
- Python网络数据采集7(译者:哈雷)
- Python网络数据采集9(译者:哈雷)
- python网络数据采集
- Python网络数据采集Urllib库的基本使用