您的位置:首页 > 编程语言 > Python开发

网页排序 HITS算法的python实现

2016-03-26 19:05 1626 查看
算法原理不在赘述,请参考:
http://blog.csdn.net/hguisu/article/details/8013489
将代码保存为.py格式,默认使用的数据是代码文件所在目录下data目录下的 pgr_data.txt 文件分别作为源数据输入。以上参数可以在源代码中修改,也可以使用命令行参数传入,参考以下启动方式:

python hits.py pgr_data.txt
命令中后参数为输入数据的途径。

代码中设立了三个参数,分别为:

size = 100 ### the size of the networks
times = 200 ### the maxmim times for iterations
error = 0.0001 ### the error used for stoping the iterations分别为hits算法的网络的最大节点数,迭代最大次数,最大误差允许。最后两个参数用来限制迭代次数。

python 源代码如下:

__author__ = 'Administrator'
import re
import sys

size = 100 ### the size of the networks
times = 200 ### the maxmim times for iterations
error = 0.0001 ### the error used for stoping the iterations

tr_data = [[0 for i in range(size)] for j in range(size)]
sum = [0 for i in range(size)]
tr_lg = 0

st =set()

def hits():
for i in range(tr_lg):
for j in range(tr_lg):
ha[i][j] = tr_data[i+1][j+1]
k=0
while(k<times-1):
err=0
k+=1
# print k
for i in range(tr_lg):
for j in range(tr_lg):
if ha[i][j]!=0:
hub[k][i] += aut[k-1][j]
aut[k][j] += hub[k-1][i]
a=b=0
for i in range(tr_lg):
a+=hub[k][i]
b+=aut[k][i]

for i in range(tr_lg):
hub[k][i] = float( hub[k][i])/a
aut[k][i] = float(aut[k][i])/b
err += abs(hub[k][i]-hub[k-1][i]) + abs(aut[k][i]-aut[k-1][i])
if err<error:
break

return k;

if __name__ == '__main__':
#for a in sys.argv:
# print a
sour = "data/pgr_data.txt"
if len(sys.argv)>1:
sour = sys.argv[1]
fp=open(sour,"r")
for line in fp:
# print line
line=re.sub(r"\n\r","",line)
ls=line.split()
l=len(ls)
# print l,ls,int(ls[0]),int(ls[1])
for i in range(l):
st.add(ls[i])
tr_data[int(ls[0])][int(ls[1])] = 1
sum[int(ls[0])] += 1
tr_lg = len(st)
print "the number of websites:",tr_lg
#print sum[1:tr_lg+1]
am = [[0.0 for i in range(tr_lg)] for j in range(tr_lg)]
res = [[0 for i in range(tr_lg)] for j in range(times)]
hub = [[0 for i in range(tr_lg)] for j in range(times)]
aut = [[0 for i in range(tr_lg)] for j in range(times)]

print "\n"
ha = [[0 for i in range(tr_lg)] for j in range(tr_lg)]
n=hits()
print "iteration times:",n,"\n","the hub:",hub
,"\nthe authority:",aut

fp.close()
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息