您的位置:首页 > 编程语言 > Python开发

Scikit-learn实现基于模型的推荐系统(SVD)

2017-08-15 10:11 691 查看
  具体分析稍后再介绍。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Author  : Peidong
# @Site    :
# @File    : MFRecommendSystem.py
# @Software: PyCharm
import numpy as np
import pandas as pd

# 读取u.data文件
header = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('ml-100k/u.data', sep='\t', names=header)

# 计算唯一用户和电影的数量
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print('Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_items))

# 使用scikit-learn库将数据集分割成测试和训练。Cross_validation.train_test_split根据测试样本的比例(test_size),本例中是0.25,来将数据混洗并分割成两个数据集
from sklearn import cross_validation as cv
train_data, test_data = cv.train_test_split(df, test_size=0.25)

# 计算数据集的稀疏度
sparsity = round(1.0 - len(df)/float(n_users*n_items), 3)
print('The sparsity level of MovieLens100K is ' + str(sparsity*100) + '%')

# 创建uesr-item矩阵,此处需创建训练和测试两个UI矩阵
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
train_data_matrix[line[1] - 1, line[2] - 1] = line[3]

test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
test_data_matrix[line[1] - 1, line[2] - 1] = line[3]

# 使用SVD进行矩阵分解
import scipy.sparse as sp
from scipy.sparse.linalg import svds

u, s, vt = svds(train_data_matrix, k=20)
s_diag_matrix = np.diag(s)
X_pred = np.dot(np.dot(u, s_diag_matrix), vt)

# 利用均方根误差进行评估
from sklearn.metrics import mean_squared_error
from math import sqrt

def rmse(prediction, ground_truth):
prediction = prediction[ground_truth.nonzero()].flatten()
ground_truth = ground_truth[ground_truth.nonzero()].flatten()
return sqrt(mean_squared_error(prediction, ground_truth))

print('User-based CF MSE: ' + str(rmse(X_pred, test_data_matrix)))
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息