您的位置:首页 > 编程语言 > Python开发

通过抓取淘宝评论为例讲解Python爬取ajax动态生成的数据

2017-04-05 18:55 891 查看
import sys
import  re
import  json
import requests

import requests
url='https://rate.taobao.com/feedRateList.htm?auctionNumId=538039793643&userNumId=2779992133¤tPageNum=6&pageSize=20&rateType=&orderType=sort_weight&attribute=&sku=&hasSku=false&folded=0&ua=154UW5TcyMNYQwiAiwQRHhBfEF8QXtHcklnMWc%3D%7CUm5Ockt%2FQnpHfktxTXBOdCI%3D%7CU2xMHDJ7G2AHYg8hAS8XKQcnCU8uSDRFaz1r%7CVGhXd1llXGhVbVBpXGZaZ1ljVGlLdUxwRH5GfkZzTHZCd0xxS2Uz%7CVWldfS0TMw05AyMfKwslGScNNwMmAHoQeQQ0BG8Tf1hnQmw6bA%3D%3D%7CVmJCbEIU%7CV2lJGSYaOgI6GiYZLRY2DzsFOhomGCMYOAI5DCwQLhIuDjQNN2E3%7CWGFBET8RMQU7BycbJBAtDTQKPwA9az0%3D%7CWWFBET8RMWFZbFV1SXZCfSsLNBQ6FDQMMQ80AFYA%7CWmNeY0N%2BXmFBfUR4WGZeZER%2BRWVbe09vU2k%2F&_ksTS=1490504947774_2145&callback=jsonp_tbcrate_reviews_list'
cont=requests.get(url).content.decode("gbk")

print(cont)
rex=re.compile(r'\w+[(]{1}(.*)[)]{1}')
content=rex.findall(cont)[0]
print(content)
con=json.loads(content,"gbk")
print("@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@")
for i in range(len(con['comments'])):
print(i+1,con['comments'][i]['content'])

#如果中文编码有错误

换成:

print(i+1)

print(con['comments'][i]['content'])
详情参见:http://www.jb51.net/article/73780.htm
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: