新浪微博数据挖掘食谱之十四: 用户篇 (分析用户的粉丝和朋友)
2015-01-10 08:15
375 查看
#!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2015-1-10 @author: beyondzhou @name: analyze_friends_followers.py ''' # Analyze user's friends and followers def analyze_friends_followers(): # import #import json from login import weibo_login from users import get_friends_followers_ids, setwise_friends_followers_analysis # Access to sina api weibo_api = weibo_login() screen_name = 'beyondzhou8' friends_ids, followers_ids = get_friends_followers_ids(weibo_api, screen_name=screen_name, friends_limit=10, followers_limit=10) setwise_friends_followers_analysis(screen_name, friends_ids, followers_ids) if __name__ == '__main__': analyze_friends_followers() # Setwise friends followers analysis def setwise_friends_followers_analysis(screen_name, friends_ids, followers_ids): friends_ids, followers_ids = set(friends_ids), set(followers_ids) print '{0} is following {1}'.format(screen_name, len(friends_ids)) print '{0} is being followed by {1}'.format(screen_name, len(followers_ids)) print '{0} of {1} are not following {2} back'.format( len(friends_ids.difference(followers_ids)), len(friends_ids), screen_name) print '{0} of {1} are not being followed back by {2}'.format( len(followers_ids.difference(friends_ids)), len(followers_ids), screen_name) print '{0} has {1} mutual friends'.format( screen_name, len(friends_ids.intersection(followers_ids))) # Crawl a friendship graph def crawl_weibo_followers(weibo_api, screen_name, limit=1000000, depth=2): from data import save_to_mongo # Resolve the ID for screen_name and start working with IDs for consistency in storage seed_id = str(weibo_api.users.show.get(screen_name=screen_name)['id']) _, next_queue = get_friends_followers_ids(weibo_api, user_id=seed_id, friends_limit=0, followers_limit=limit) # Store a seed_id => _follower_ids mapping in MongoDB save_to_mongo({'followers' : [ _id for _id in next_queue ]}, 'followers_crawl', '{0}-follower_ids'.format(seed_id)) d = 1 while d < depth: d += 1 (queue, next_queue) = (next_queue, []) for fid in queue: follower_ids = get_friends_followers_ids(weibo_api, user_id=fid, friends_limit=0, followers_limit=limit) # Store a fid => follower_ids mapping in MongoDB save_to_mongo({'followers' : [ _id for _id in next_queue ]}, 'followers_crawl', '{0}-follower_ids'.format(fid)) next_queue += follower_ids
callback_url: https://api.weibo.com/oauth2/authorize?redirect_uri=http%3A//apps.weibo.com/guaguastd&response_type=code&client_id=2925245021 return_redirect_uri: http://weibo.com/login.php?url=http%3A%2F%2Fapps.weibo.com%2Fguaguastd%3Fcode%3D2924445625943200f9c7b3870bac7da1 code: ['2924445625943200f9c7b3870bac7da1'] Fetched 3 total friends ids for beyondzhou8 Fetched 3 total friends ids for beyondzhou8 Fetched 0 total followers ids for beyondzhou8 beyondzhou8 is following 3 beyondzhou8 is being followed by 0 3 of 3 are not following beyondzhou8 back 0 of 0 are not being followed back by beyondzhou8 beyondzhou8 has 0 mutual friends
相关文章推荐
- 新浪微博数据挖掘食谱之十二: 用户篇 (批量获取用户的粉丝数和朋友数)
- 新浪微博数据挖掘食谱之十一: 用户篇 (批量获取用户信息)
- 新浪微博数据挖掘食谱之十五: 爬虫篇 (抓取用户的朋友)
- 新浪微博数据挖掘食谱之十三: 微博篇 (批量获取用户的微博)
- 新浪微博数据挖掘食谱之九: 用户篇 (获取转发微博的用户名)
- 新浪微博数据挖掘食谱之十: 元素篇 (提取转发微博的元素)
- 分析以数据挖掘技术预测用户流失情况的方法
- 新浪微博数据挖掘食谱之十六: 微博篇 (词汇差异性,词汇均值)
- 新浪微博数据挖掘食谱之一: 登录篇 (API)
- 新浪微博数据挖掘食谱之六: 元素篇 (提取微博元素)
- 新浪微博如何挖掘大数据资源为用户带来新价值
- python数据分析与挖掘实战-第六章拓展偷漏税用户识别
- 新浪微博如何挖掘大数据资源为用户带来新价值
- 新浪微博数据挖掘食谱之八: 查询篇 (查询最流行的微博元素)
- 新浪微博如何挖掘大数据资源为用户带来新价值
- 【大数据干货】轻松处理每天2TB的日志数据,支撑运营团队进行大数据分析挖掘,随时洞察用户个性化需求。
- 新浪微博如何挖掘大数据资源为用户带来新价值
- 新浪微博如何挖掘大数据资源为用户带来新价值
- 新浪微博数据挖掘食谱之三: 搜索篇 (selenium)
- 新浪微博数据挖掘食谱之四: 保存篇 (json text格式)