基于python(xpath)的-爬取51job网信息(跳过User-Agent)
2018-02-28 21:20
405 查看
# -*- coding:utf-8 -*- import requests from fake_useragent import UserAgent from lxml import etree agent = UserAgent() url = "http://search.51job.com/list/010000%252C020000%252C030200%252C040000%252C080200,000000,0000,00,9,99,python,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=" response = requests.get( url, headers={"User-Agent":agent.random}, ) response.encoding = response.apparent_encoding root = etree.HTML(response.text) div_list = root.xpath('//div[@class="dw_table"]/div[@class="el"]') for div in div_list: name = div.xpath('p/span/a/text()')[0] name = name.strip() company = div.xpath('span[@class="t2"]/a/text()')[0] place = div.xpath('span[@class="t3"]/text()')[0] money = div.xpath('span[@class="t4"]/text()') time = div.xpath('span[@class="t5"]/text()') # if not money: # money = "面议" # else: # money = money[0] money = money[0] if money else "面议" time = time[0] if time else "没有时间" print("职位名:%s" % name) print("公司名:%s" % company) print("工作地点:%s" % place) print("薪资:%s" % money) print("上传时间:%s" % time) print("----------------------------") # with open('job.csv', 'a', encoding='gb18030') as f: # f.write(name+','+company+','+place+','+money+','+time) # f.write('\n') with open('51job.csv', 'a', encoding='gb18030') as f: im_list = [name,company,place,money,time,'\n'] f.write(','.join(im_list))
相关文章推荐
- python 抓取request信息,各种cookie,user-agent类的信息,只调试到http可以抓取,https貌似不行。
- python爬虫之爬取51job上python的岗位信息
- Python网络爬虫与信息提取-Day8-基于bs4库的HTML格式输出
- python基于zabbix获取服务器基础信息
- Python 爬虫一些常用的UA(user-agent)
- python 怎么模拟加header(如User-Agent、Content-Type等等)
- js获取客户端time,cookie,url,ip,refer,user_agent信息:
- Javascript UserAgent 获取平台及浏览器信息
- 3个检测浏览器User-Agent信息的网站
- 浏览器User-Agent的详细信息
- python scrapy 之 随机选择user-agent
- 手机浏览器User-Agent信息
- JavaScript学习笔记2:通过user-Agent获取浏览器和操作系统信息
- 基于python的-Random_Agent
- 浏览器User-Agent的详细信息
- selenium+python 更改默认请求头user-agent
- 手机浏览器User-Agent信息
- python3 网络爬虫(五)scrapy中使用User-Agent
- 浏览器User-Agent的详细信息
- JS实现本地存储信息的方法(基于localStorage与userData)