您的位置:首页 > 编程语言

理解梯度下降,随机梯度下降,附电影推荐系统的简单代码小样 2

2017-10-22 23:25 661 查看
这是这一title 的下半部分,主要是因为这个浏览器好像缓存不了那么多东西,所以写到某一个临界点的时候,总是崩溃,要死了我都。

最后一部分,老师给了八十万行的数据,让我们自行处理,本来是要按照上面的代码处理一下就好了,我自己写了个三维的图。

import pandas as pd

#three dimensions, x is item y is rating z is the num of people who rating this item

#axis x

x = np.array(list(set(Y4.item)))#could be considered as the name of movie

#axis y

rating_mean = Y4.rating.mean()

#Y4 represents the original dataframe (because there was a same Y as the question 3, I have changed it to make sure that there is no relation between these two questions)

Y4.rating -= Y4.rating.mean()

#get ratings

y = pd.DataFrame(np.linspace(rating_mean,rating_mean,x.shape[0]+1))# because we need to drop one column we need to add extra column

#gY.index=x

y = y.drop(0)#don't need to get a forloop to update the index, delete the column 0 directly

#get user 943

users = np.array(range(1,944))

def movie_stochastic_gradient(Y4, y):

    gy = pd.DataFrame(np.zeros(y.shape), index=y.index)

    random_user = users[np.random.randint(users.shape[0]-1)]#is the same as 'np.random.randint(users.size)'

    items = list(set(Y4.item[Y4.user==random_user]))

    #items = list(Y4.item[Y4.user==1])

    #print(items)

    #get all the ratings from this user (there are some same nums)

    Y4_newform = Y4[Y4.user==random_user]#get a new form only belonged to this random_user and then we could easily get the rating

    for item in items:

        rating = list(set(Y4_newform.rating[Y4_newform.item==item]))[0]# in this form the same items and ratings have repeated several times

        #print(y[item])

        

        gy[0][gy.index==item] +=2*(y[0][y.index==item] - rating)

    return gy

learning_rate = 0.01

iterations = 100

for i in range(iterations):

    gy = movie_stochastic_gradient(Y4, y)

    print('^_^ We have iterated', i, 'times.')

    y -= learning_rate*gy

    #print(y)

# axis z the num of users who rated the same film

z = np.zeros(x.shape) # index means the name of film, z[x] means one movie was rated for x times

for item in Y4.item:

    z[item-1] += 1

#show the 3d map

import pylab as py

import mpl_toolkits.mplot3d.axes3d as p3 

fig = py.figure()

ax = p3.Axes3D(fig)

ax.scatter(x,y,z)

ax.set_xlabel('film')

ax.set_ylabel('rating')

ax.set_zlabel('users_num')

fig.add_axes(ax)

py.show()

主要选取了三个变量,电影的名字,电影被评论的次数以及电影受到用户影响之后的评分。

最后的结果大概是这个样子:
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: