您的位置:首页 > 编程语言 > Python开发

python-recsys:一款实现推荐系统的python库

2017-11-20 20:29 429 查看




python-recsys是一个用来实现推荐系统的python库。


安装


依赖项

python-recsys构建于Divisi2(基于语义网络的常识推理库)之上,使用了csc-pysparse(稀疏矩阵计算库),而Divisi2依赖于NumPy和Networkx库。另外python-recsys也依赖于SciPy库。

安装依赖库过程如下(以Ubuntu为例):

Shell

12345678sudo apt-get install python-scipy python-numpysudo apt-get install python-pipsudo pip install csc-pysparse networkx divisi2 # If you don't have pip installed then do:# sudo easy_install csc-pysparse# sudo easy_install networkx# sudo easy_install divisi2
先从github上下载安装文件,再安装python-recsys:Shell

1

2

3

tar
xvfz
python-recsys.tar.gz

cd
python-recsys

sudo
python
setup.py
install


示例

加载Movielens数据集:

Python

12345from recsys.algorithm.factorize import SVDsvd = SVD()svd.load_data(filename='./data/movielens/ratings.dat', sep='::', format={'col':0, 'row':1, 'value':2, 'ids': int})
进行奇异值分解 (SVD), M=U Sigma V^t:
Python

1

2

3

4

5

6

7

k
=
100

svd.compute(k=k,

min_values=10,

pre_normalize=None,

mean_center=True,

post_normalize=True,

savefile='/tmp/movielens')

得到两部电影的相似性:

Python

12345ITEMID1 = 1 # Toy Story (1995)ITEMID2 = 2355 # A bug's life (1998) svd.similarity(ITEMID1, ITEMID2)# 0.67706936677315799
获得和电影Toy Story相似的电影:
Python

1

2

3

4

5

6

7

8

9

10

11

12

13

svd.similar(ITEMID1)

# Returns: <ITEMID, Cosine Similarity
Value>

[(1, 0.99999999999999978),
# Toy Story

(3114,
0.87060391051018071),
# Toy Story 2

(2355,
0.67706936677315799),
# A bug's life

(588, 0.5807351496754426), #
Aladdin

(595, 0.46031829709743477),
# Beauty and the Beast

(1907,
0.44589398718134365),
# Mulan

(364, 0.42908159895574161),
# The Lion King

(2081,
0.42566581277820803),
# The Little Mermaid

(3396,
0.42474056361935913),
# The Muppet Movie

(2761,
0.40439361857585354)]
# The Iron Giant

预测一个用户 (USERID) 将给一部电影 (ITEMID)的打分:

Python

12345678910MIN_RATING = 0.0MAX_RATING = 5.0ITEMID = 1USERID = 1 svd.predict(ITEMID, USERID, MIN_RATING, MAX_RATING)# Predicted value 5.0 svd.get_matrix().value(ITEMID, USERID)# Real value 5.0
推荐 (没被用户打过分的) 电影给用户:
Python

1

2

3

4

5

6

7

8

9

10

11

12

13

svd.recommend(USERID,
is_row=False)
#cols are users and rows are
items, thus we set is_row=False

# Returns: <ITEMID, Predicted Rating>

[(2905,
5.2133848204673416),
# Shaggy D.A., The

(318, 5.2052108435956033),
# Shawshank Redemption, The

(2019,
5.1037438278755474),
# Seven Samurai (The Magnificent
Seven)

(1178,
5.0962756861447023),
# Paths of Glory (1957)

(904, 5.0771405690055724),
# Rear Window (1954)

(1250,
5.0744156653222436),
# Bridge on the River Kwai, The

(858, 5.0650911066862907),
# Godfather, The

(922, 5.0605327279819408),
# Sunset Blvd.

(1198,
5.0554543765500419),
# Raiders of the Lost Ark

(1148,
5.0548789542105332)]
# Wrong Trousers, The

哪些用户应该会看Toy Story (哪些没给Toy Story打过分的用户将给它一个高的打分?)?

Python

12345678910111213svd.recommend(ITEMID) # Returns: <USERID, Predicted Rating>[(283, 5.716264440514446), (3604, 5.6471765418323141), (5056, 5.6218800339214496), (446, 5.5707524860615738), (3902, 5.5494529168484652), (4634, 5.51643364021289), (3324, 5.5138903299082802), (4801, 5.4947999354188548), (1131, 5.4941438045650068), (2339, 5.4916048051511659)]

文档

从doc/source目录创建HTML文档:

1

2

cd
doc

make
html

HTML 将被创建在下面路径中:

1

doc/build/html/index.html

开源地址:https://github.com/ocelma/python-recsys
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: