python3.x爬虫学习:股票数据定向爬虫笔记
2017-03-15 21:48
381 查看
import requests from bs4 import BeautifulSoup import traceback import re def getHTMLtext(url, code="utf-8"): try: r =requests.get(url) r.raise_for_status() r.encoding = code print("test") return r.text except: return "" def getStockList(list,stockURL): html = getHTMLtext(stockURL,"GB2312") print("getstockList start") soup = BeautifulSoup(html,'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] list.append(re.findall(r"[s][hz]\d{6}",href)[0]) except: continue def getStockInfo(list,stockURL,filePath): count = 0 for stock in list: url = stockURL + stock +".html" html =getHTMLtext(url) try: if html=="": continue infoDict = {} soup = BeautifulSoup(html,"html.parser") stockInfo = soup.find('div',attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'})[0] infoDict.update({'股票名称': name.next.split()[0]}) keylist =stockInfo.find_all('dt') vauleList = stockInfo.find_all('dd') for i in range(len(keylist)): key =keylist[i].text vaule = vauleList[i].text infoDict[key]= vaule with open(filePath,'a',encoding='utf-8') as f: f.write( str(infoDict) + '\n') count= count+1 print("\r当前进度: {:.2f}%".format(count*100/len(list,end=""))) except: count =count +1 print("\r当前进度: {:.2f}%".format(count*100/len(list)),end="") continue def main(): print("start") stock_list_url='http://quote.eastmoney.com/stocklist.html' stock_info_url = 'https://gupiao.baidu.com/stock/' output_file = 'D:/BaiduStockInfo.txt' slist=[] getStockList(slist,stock_list_url) getStockInfo(slist,stock_info_url,output_file) print("end") main()
相关文章推荐
- python爬虫学习(股票数据爬取)
- python3.4学习笔记(十四) 网络爬虫实例代码,抓取新浪爱彩双色球开奖数据实例
- python爬虫:使用Mongodb数据库存储数据学习笔记
- python爬虫实战二——股票数据定向爬虫【有补充】
- Python爬虫(入门+进阶)学习笔记 1-5 使用pandas保存豆瓣短评数据
- python爬虫案例——新浪腾讯股票数据采集
- 一个用Python编写的股票数据(沪深)爬虫和选股策略测试框架
- Python 爬虫实战(2):股票数据定向爬虫
- python爬虫案例——证券之星股票数据采集
- python股票数据爬虫requests、etree、BeautifulSoup学习
- python爬虫案例——东方财富股票数据采集
- Python 爬虫学习2 向网页提交数据
- 【数据挖掘学习】 2.1 Python网络爬虫:Python安装
- python3.x爬虫:爬取大学排名数据
- python爬虫案例——东方财富股票数据采集
- python python 入门学习之网页数据爬虫搜狐汽车数据库
- Python3.x 爬虫学习笔记——判断网页的编码方式
- python3.x之爬虫学习
- python-框架-网页爬虫-文本处理-科学计算-可视化-机器学习-数据挖掘-深度学习
- python爬虫学习 之 定向爬取 股票信息