您的位置:首页 > 其它

微博社交圈子挖掘所面临的困难

2012-02-28 15:04 281 查看
我很喜欢《亮剑》这部电视剧,李云龙经常说:我们不能打了半天,不知道敌人是谁。

所以 ,这一篇文章,我简单分析一下,微博社交圈子挖掘目前遇到的问题是什么?不能分析了半天,只注重结果如何如何,却忘记了最根本的问题、难题是什么。 复杂网络中社团结构发现的研究已经有好多年了,有分裂的方法,凝聚的方法,基于网络动力学的方法,还有很多别的奇怪的方法。这些方法都有各自适应的解决的网络结构。比如前一篇博客中提到的两点:

层次性

重叠性

一些方法层次性处理的很好,如GN,Newman‘s fast algorithm等,有些重叠性处理的非常好,如k-clique方法比较典型。后来,研究人员相继提出一些方法,将上面的两个问题放在一起解决,比如计算所@沈华伟_ICT在09年的一篇论文,提到了EAGLE算法,能够很好的处理层次性和重叠性,并且在词构成的复杂网络和科学家网络中,得到的很好的结果,是能够比Newman's fast algorithm和k-clique算法有更好的效果是一个不错的工作。 但是上面的算法,应用到微博圈子挖掘中,效果差强人意。原因在哪里呢?就在于:微博是一个比“复杂网络”还要“复杂”的网络。比如GN算法,在节点平局度为5,6,7的时候都会有不错的准确率,但是到8的时候,结果就很差了。Newman's fast algorithm是同样的。k-clique算法如果应用在节点平均度比较大的网络,就会出现很大的社团,这在前一篇博客中的例子很明显的说明了这一点。在@沈华伟_ICT的论文中,平均度分别为:9和3(做了四舍五入),这和微博平均初度相比,还是很小的。我将我关注的网络进行了统计,只保留双向关注,减小了网络的规模,请原谅我没有用图来表示,因为networkx画出来的图,也是一团糟,太密集了。具体统计如下:

number_of_nodes232
number_of_edges1467
degree_of_刘大鸿8
degree_of_摇摆巴赫25
degree_of_周运洪yunhong15
degree_of_搜狗郭昂21
degree_of_东坡门人5
degree_of_暖暖cathy5
degree_of_淘宝虚云9
degree_of_IR-Lucene35
degree_of_杨先锋UU4
degree_of_hszhsh19156470476
degree_of_谭卫国Forest5
degree_of_荣名为宝4
degree_of_王志超14
degree_of_悦晓07097
degree_of_Joseph星之海洋2
degree_of_bicloud16
degree_of_张刚-bert13
degree_of_gycheng15
degree_of_Humyy6
degree_of_张某_ICT230
degree_of_bill3237
degree_of_蕃茄me7
degree_of_nancy_46332
degree_of_冷建成5
degree_of_kafka01024
degree_of_檀林_hootch8
degree_of_jluwanghui1
degree_of_杨彦闯12
degree_of_崔冲儿2
degree_of_yellowleaf201010
degree_of_郭嘉丰_ICT20
degree_of_柯南小胖道尔4
degree_of_公帅_ICT18
degree_of_aslyc5
degree_of_sunli122323
degree_of_创业者徐仁禄6
degree_of_张诚zhangcheng1
degree_of_Darryn1
degree_of_李-曙光11
degree_of_foxmailed15
degree_of_Firewind4
degree_of_四正9
degree_of_燕子_lynn3
degree_of_王东wd3
degree_of_蒋涛CSDN22
degree_of_常佳佳-Jason16
degree_of_詹剑锋_中科院62
degree_of_GUCAS老H17
degree_of_宋波simba13
degree_of_马金柱focus8
degree_of_梁公军20
degree_of_小木1181
degree_of_Cherwen1
degree_of_威廉他11
degree_of_雨前LYQ12
degree_of_苏牧洋1
degree_of_elvar20113
degree_of_沈华伟_ICT9
degree_of_霍泰稳27
degree_of_Yahoo韩轶平10
degree_of_小象公主-猎头一枚6
degree_of_炼心-自强8
degree_of_王向东6
degree_of_泰山泰山1
degree_of_视觉研究10
degree_of_八六孩儿3
degree_of__Diaoer3
degree_of_孟二利22
degree_of_雨梦_yumengkk14
degree_of_wpwei2
degree_of_IT民工-老蓝1
degree_of_TimYang37
degree_of_陈利人5
degree_of_桂林山水786
degree_of_Saylove浣熊1
degree_of_崔卫兵2
degree_of_sigmod9
degree_of_林乐宇_冰山雪豹4
degree_of_杨逍Venus11
degree_of_新IT民工21
degree_of_头不疼1
degree_of_庞崇-1
degree_of_爱的马斯特14
degree_of_Binos_ICT7
degree_of_豆爸何锐8
degree_of_幸运coming琳琳12
degree_of_RefuseBT1
degree_of_网路冷眼32
degree_of_橘子郡_guy15
degree_of_秋实Li4
degree_of_BetaCafe10
degree_of_AmyDeng_Fusionio38
degree_of_jingmouren10
degree_of_jqliu4
degree_of_影子猎手9
degree_of_liangjz36
degree_of_bodd6
degree_of_海带丝丝3
degree_of_宗秀倩5
degree_of_程序媛8
degree_of_互联网聚焦2
degree_of_李猛-Mn29
degree_of_51CTO官方微博7
degree_of_MapReduce31
degree_of_小樱Daisy1
degree_of_IT技术博客大学习22
degree_of_XiaoJunHong19
degree_of_肖瑞麟Jerry13
degree_of_凌峰TB6
degree_of_董安民3
degree_of_美国经济1
degree_of_张启达4
degree_of_万树-杨7
degree_of_陈房伟2
degree_of_围棋搜索引擎9
degree_of_wenzhihong2
degree_of_吴尔平-andy6
degree_of_大时代投资1
degree_of_KissDev46
degree_of_forchenyun13
degree_of_Ken王健4
degree_of_琦大头5
degree_of_花开花落10032
degree_of_张夏天_机器学习15
degree_of_-林鸿飞-28
degree_of_solochar50
degree_of_luketty3
degree_of_黄麟晰2
degree_of_张永生10
degree_of_arpro7
degree_of_董明楷4
degree_of_淘解伦35
degree_of_ICT_朱亚东21
degree_of_胡云华MSRA6
degree_of_zangxt11
degree_of_lordhong6
degree_of_hny1014
degree_of_鱼晓-五毛12
degree_of_guoyipeng10
degree_of_TreapDB40
degree_of_张杰_NoahArk6
degree_of_吕慧伟4
degree_of___那谁__15
degree_of_梁斌penny80
degree_of_飞雪巴啦巴巴巴1
degree_of_建文的马甲5
degree_of_THUIRDB61
degree_of_花火易碎5
degree_of_潘少宁_腾讯_LAMP人26
degree_of_Eva奶奶2
degree_of_alue-fabre5
degree_of_肖隆平1
degree_of_loveEmma11
degree_of_创业-育森19
degree_of_独孤虎-李利鹏10
degree_of_berlinix5
degree_of_gongbin8
degree_of_Abioy10
degree_of_即刻搜索JIKE11
degree_of_触景无限3
degree_of_王上游4
degree_of_易观胡斌1
degree_of_小鱼西游4
degree_of_兔杰列夫4
degree_of_草木菁菁无畏1
degree_of_wenzhong25
degree_of_丁国栋_ICT40
degree_of_袁小晕9
degree_of_拓尔思11
degree_of_zhh_1211卉2
degree_of_nzinfo25
degree_of_王大美10
degree_of_刘德超richard1
degree_of_聋瞎的世界1
degree_of_丕子34
degree_of_工体东路5
degree_of_fishermen17
degree_of_刘克庄2
degree_of_任勇_东京大学23
degree_of_leeyanva15
degree_of_闫刚20121
degree_of_贺志明_ICT39
degree_of_winston9
degree_of_yaronli18
degree_of_bian15
degree_of_fengyuncrawl73
degree_of_王斌_ICTIR33
degree_of_SunGis6
degree_of_数据挖掘_PHP11
degree_of_OnlyXP8
degree_of_王联辉22
degree_of_张凯197611
degree_of_罗大维15
degree_of_奔三北P2
degree_of_net_ashes4
degree_of_小五丫头1
degree_of_九州-姬野31
degree_of_温小燕儿1
degree_of_薄荷糖糖3
degree_of_宁怡2
degree_of_wave2future2
degree_of_Qunar-JarnTang7
degree_of_关毅的围脖18
degree_of_猎头-Kevin19
degree_of_玉宇金辉2
degree_of_liudaoru19
degree_of_王栋PKU13
degree_of_淘宝日照19
degree_of_争一言1
degree_of_蓝俊杰1
degree_of_武卫东23
degree_of_ElmerZhang5
degree_of_生活精选1
degree_of_object3
degree_of_soker6
degree_of_asddew23f3
degree_of_孟鸿7
degree_of_c背a井t医y志y猫c3
degree_of_佟怡峦7
degree_of_hi郭海峰2
degree_of_图灵杨海玲32
degree_of_顾平Baidu7
degree_of_林语姿1
degree_of_QLeelulu4
degree_of_张颖峰21
degree_of_淘宝褚霸41
degree_of_墨顏Shiver1
degree_of_微博Koth38
degree_of_魔时科技张首华33
degree_of_longxibendi2
degree_of_timo10
average_of_degree12
最上面的节点数和边数,中间是每个节点的度,最后是平均度12,相对于已有的一些工作而言,这个度是在太大了。(注:这是去年的数据,与现在的数据肯定是不符的)。 通过的上面的分析和数据,我们发现,微博圈子挖掘所面临的主要问题是微博网络过于复杂,我一个只有200多关注的用户,平均度已经达到12,可想一些名人的微博,会是更加复杂。所以,急需一些新的方法解决这个问题,也希望本篇文章能够对准备做社交挖掘的同学有帮助。至于我的解决方法,有一些初步的想法,等到有结果了,我会和大家分享。 另外,会有同学认为,大规模分析,才是问题。我想这个也得方法有了之后,再去想吧,最想要的还是效果。 希望和大家一起讨论。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: