您的位置:首页 > 编程语言 > Python开发

基于python(xpath)的-爬取51job网信息(跳过User-Agent)

2018-02-28 21:20 405 查看
# -*- coding:utf-8 -*-

import requests
from fake_useragent import UserAgent
from lxml import etree

agent = UserAgent()
url = "http://search.51job.com/list/010000%252C020000%252C030200%252C040000%252C080200,000000,0000,00,9,99,python,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare="
response = requests.get(
url,
headers={"User-Agent":agent.random},
)
response.encoding = response.apparent_encoding
root = etree.HTML(response.text)
div_list = root.xpath('//div[@class="dw_table"]/div[@class="el"]')

for div in div_list:
name = div.xpath('p/span/a/text()')[0]
name = name.strip()
company = div.xpath('span[@class="t2"]/a/text()')[0]
place = div.xpath('span[@class="t3"]/text()')[0]
money = div.xpath('span[@class="t4"]/text()')
time = div.xpath('span[@class="t5"]/text()')
# if not money:
#     money = "面议"
# else:
#     money = money[0]
money = money[0] if money else "面议"
time = time[0] if time else "没有时间"
print("职位名:%s" % name)
print("公司名:%s" % company)
print("工作地点:%s" % place)
print("薪资:%s" % money)
print("上传时间:%s" % time)
print("----------------------------")
# with open('job.csv', 'a', encoding='gb18030') as f:
#     f.write(name+','+company+','+place+','+money+','+time)
#     f.write('\n')
with open('51job.csv', 'a', encoding='gb18030') as f:
im_list = [name,company,place,money,time,'\n']
f.write(','.join(im_list))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: