您的位置:首页 > 编程语言 > Python开发

Matrix Factorization: A Simple Tutorial and Implementation in Python

2016-05-12 20:42 501 查看
本文转自http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/,所有权力归原作者所有。

Thereis probably no need to say that thereis too much information on theWeb nowadays. Search engines help us a littlebit. What is better is to havesomething interesting recommended to us automatically without asking.
Indeed, from as simpleas a list of themost popular bookmarks on Delicious,
to somemorepersonalized recommendations wereceived on Amazon, weareusually
offered recommendations on theWeb.

Recommendations can begenerated by a widerangeof algorithms. Whileuser-based or item-based collaborative
filtering methods aresimpleand intuitive, matrix factorization techniques areusually moreeffectivebecausethey allow us to discover thelatent features underlying theinteractions between users and items. Of course, matrix factorization is simply
a mathematical tool for playing around with matrices, and is thereforeapplicablein many scenarios whereonewould liketo find out something hidden under thedata.

In this tutorial, wewill go through thebasic ideas and themathematics of matrix factorization, and then wewill present a simpleimplementation in Python.
Wewill proceed with theassumption that wearedealing with user ratings (e.g. an integer scorefrom therangeof 1 to 5)of items in a recommendation system.

Tableof Contents:

Basic
Ideas

The
mathematics of matrix factorization

Regularization

Implementation
in Python

Further
Information

Source
Code

References


Basic Ideas

Just as its namesuggests, matrix factorization is to, obviously, factorizea matrix, i.e. to find out two (or more)matrices such that when you multiply them you will get backtheoriginal matrix.

As I havementioned above, from an application point of view, matrix factorization can beused to discover latent features underlying theinteractions between two different kinds of entities. (Of course, you can consider morethan two kinds of entities and
you will bedealing with tensor factorization, which would bemorecomplicated.)And oneobvious application is to predict ratings in collaborativefiltering.

In a recommendation system such as Netflix or MovieLens,
thereis a group of users and a set of items (movies for theabovetwo systems). Given that each users haverated someitems in thesystem, wewould liketo predict how theusers would ratetheitems that they havenot yet rated, such that wecan makerecommendations
to theusers. In this case, all theinformation wehaveabout theexisting ratings can berepresented in a matrix. Assumenow wehave5 users and 10 items, and ratings areintegers ranging from 1 to 5, thematrix may looksomething likethis (a hyphen means
that theuser has not yet rated themovie):
D1D2D3D4
U153-1
U24--1
U311-5
U41--4
U5-154
Hence, thetaskof predicting themissing ratings can beconsidered as filling in theblanks (thehyphens in thematrix)such that thevalues would beconsistent with theexisting ratings in thematrix.

Theintuition behind using matrix factorization to solvethis problem is that thereshould besomelatent features that determinehow a user rates an item. For example, two users would givehigh ratings to a certain movieif they both liketheactors/actresses
of themovie, or if themovieis an action movie, which is a genrepreferred by both users. Hence, if wecan discover theselatent features, weshould beableto predict a rating with respect to a certain user and a certain item, becausethefeatures associated
with theuser should match with thefeatures associated with theitem.

In trying to discover thedifferent features, wealso maketheassumption that thenumber of features would besmaller than thenumber of users and thenumber of items. It should not bedifficult to understand this assumption becauseclearly it would not be
reasonableto assumethat each user is associated with a uniquefeature(although this is not impossible). And anyway if this is thecasetherewould beno point in making recommendations, becauseeach of theseusers would not beinterested in theitems rated
by other users. Similarly, thesameargument applies to theitems.


Themathematics of matrix factorization

Having discussed theintuition behind matrix factorization, wecan now go on to workon themathematics. Firstly, wehavea set

of
users, and a set

of
items. Let

of
size

be
thematrix that contains all theratings that theusers haveassigned to theitems. Also, weassumethat wewould liketo discover $K$ latent features. Our task, then, is to find two matrics matrices

(a

matrix)
and

(a

matrix)
such that their product approximates

:



In this way, each row of

would
represent thestrength of theassociations between a user and thefeatures. Similarly, each row of

would
represent thestrength of theassociations between an item and thefeatures. To get theprediction of a rating of an item

by

,
wecan calculatethedot product of thetwo vectors corresponding to

and

:



Now, wehaveto find a way to obtain

and

.
Oneway to approach this problem is thefirst intializethetwo matrices with somevalues, calculatehow `different’ their product is to

,
and then try to minimizethis differenceiteratively. Such a method is called gradient descent, aiming at finding a local minimum of thedifference.

Thedifferencehere, usually called theerror between theestimated rating and thereal rating, can becalculated by thefollowing equation for each user-item pair:



Hereweconsider thesquared error becausetheestimated rating can beeither higher or lower than thereal rating.

To minimizetheerror, wehaveto know in which direction wehaveto modify thevalues of

and

.
In other words, weneed to know thegradient at thecurrent values, and thereforewedifferentiatetheaboveequation with respect to thesetwo variables separately:





Having obtained thegradient, wecan now formulatetheupdaterules for both

and

:





Here,

is
a constant whosevaluedetermines therateof approaching theminimum. Usually wewill choosea small valuefor

,
say 0.0002. This is becauseif wemaketoo largea steptowards theminimum wemay run into theriskof missing theminimum and end up oscillating around theminimum.

A question might havecometo your mind by now: if wefind two matrices

and

such
that

approximates

,
isn’t that our predictions of all theunseen ratings will all bezeros? In fact, wearenot really trying to comeup with

and

such
that wecan reproduce

exactly.
Instead, wewill only try to minimisetheerrors of theobserved user-item pairs. In other words, if welet

be
a set of tuples, each of which is in theform of

,
such that

contains
all theobserved user-item pairs together with theassociated ratings, weareonly trying to minimiseevery

for

.
(In other words,

is
our set of training data.)As for therest of theunknowns, wewill beableto determinetheir values oncetheassociations between theusers, items and features havebeen learnt.

Using theaboveupdaterules, wecan then iteratively perform theoperation until theerror converges to its minimum. Wecan checktheoverall error as calculated using thefollowing equation and determinewhen weshould stop theprocess.




Regularization

Theabovealgorithm is a very basic algorithm for factorizing a matrix. Therearea lot of methods to makethings lookmorecomplicated. A common extension to this basic algorithm is to introduceregularization to avoid overfitting. This is doneby adding a
parameter

and
modify thesquared error as follows:



In other words, thenew parameter

is
used to control themagnitudes of theuser-featureand item-featurevectors such that

and

would
givea good approximation of

without
having to contain largenumbers. In practice,

is
set to somevalues in therangeof 0.02. Thenew updaterules for this squared error can beobtained by a proceduresimilar to theonedescribed above. Thenew updaterules areas follows.






Implementation in Python

Oncewehavederived theupdaterules as described above, it actually becomes very straightforward to implement thealgorithm. Thefollowing is a function that implements thealgorithm in Python (notethat this implementation requires thenumpy module).

Note: ThecompletePython codeis availablefor download in section Source
Codeat theend of this post.

01
import
numpy
02
03
def
matrix_factorization(R,
P, Q, K, steps
=
5000
,
alpha
=
0.0002
,
beta
=
0.02
):
04
Q
=
Q.T
05
for
step
in
xrange
(steps):
06
for
i
in
xrange
(
len
(R)):
07
for
j
in
xrange
(
len
(R[i])):
08
if
R[i][j]
>
0
:
09
eij
=
R[i][j]
-
numpy.dot(P[i,:],Q[:,j])
10
for
k
in
xrange
(K):
11
P[i][k]
=
P[i][k]
+
alpha
*
(
2
*
eij
*
Q[k][j]
-
beta
*
P[i][k])
12
Q[k][j]
=
Q[k][j]
+
alpha
*
(
2
*
eij
*
P[i][k]
-
beta
*
Q[k][j])
13
eR
=
numpy.dot(P,Q)
14
e
=
0
15
for
i
in
xrange
(
len
(R)):
16
for
j
in
xrange
(
len
(R[i])):
17
if
R[i][j]
>
0
:
18
e
=
e
+
pow
(R[i][j]
-
numpy.dot(P[i,:],Q[:,j]),
2
)
19
for
k
in
xrange
(K):
20
e
=
e
+
(beta
/
2
)
*
(
pow
(P[i][k],
2
)
+
pow
(Q[k][j],
2
))
21
if
e
<
0.001
:
22
break
23
return
P,
Q.T
Wecan try to apply it to our examplementioned aboveand seewhat wewould get. Below is a codesnippet in Python for running theexample.

01
R
=
[
02
 
[
5
,
3
,
0
,
1
],
03
 
[
4
,
0
,
0
,
1
],
04
 
[
1
,
1
,
0
,
5
],
05
 
[
1
,
0
,
0
,
4
],
06
 
[
0
,
1
,
5
,
4
],
07
]
08
09
R
=
numpy.array(R)
10
11
N
=
len
(R)
12
M
=
len
(R[
0
])
13
K
=
2
14
15
P
=
numpy.random.rand(N,K)
16
Q
=
numpy.random.rand(M,K)
17
18
nP,
nQ
=
matrix_factorization(R,
P, Q, K)
19
nR
=
numpy.dot(nP,
nQ.T)
And thematrix obtained from theaboveprocess would looksomething likethis:

D1D2D3D4
U14.972.982.180.98
U23.972.401.970.99
U31.020.935.324.93
U41.000.854.593.93
U51.361.074.894.12
Wecan seethat for existing ratings wehavetheapproximations very closeto thetruevalues, and wealso get some'predictions' of theunknown values. In this simpleexample, wecan easily seethat U1 and U2 havesimilar tasteand they both rated D1 and D2
high, whiletherest of theusers preferred D3, D4 and D5. When thenumber of features (Kin thePython code)is 2, thealgorithm is ableto associatetheusers and items to two different features, and thepredictions also follow theseassociations. For example,
wecan seethat thepredicted rating of U4 on D3 is 4.59, becauseU4 and U5 both rated D4 high.


Further Information

Wehavediscussed theintuitivemeaning of thetechniqueof matrix factorization and its usein collaborativefiltering. In fact, therearemany different extensions to theabovetechnique. An important extension is therequirement that all theelements of
thefactor matrices (

and

in
theaboveexample)should benon-negative. In this caseit is called non-negativematrix factorization (NMF). Oneadvantageof NMF is that it results in intuitivemeanings of theresultant matrices. Sinceno elements arenegative, theprocess of multiplying
theresultant matrices to get backtheoriginal matrix would not involvesubtraction, and can beconsidered as a process of generating theoriginal data by linear combinations of thelatent features.


SourceCode

Thefull Python sourcecodeof this tutorial is availablefor download at:

mf.py

这里把链接中的代码贴出来以防代码失效:

#!/usr/bin/python
#
# Created by Albert Au Yeung (2010)
#
# An implementation of matrix factorization
#
try:
import numpy
except:
print "This implementation requires thenumpy module."
exit(0)

###############################################################################

"""
@INPUT:
R: a matrix to befactorized, dimension Nx M
P: an initial matrix of dimension Nx K
Q: an initial matrix of dimension Mx K
K: thenumber of latent features
steps : themaximum number of steps to perform theoptimisation
alpha: thelearning rate
beta: theregularization parameter
@OUTPUT:
thefinal matrices Pand Q
"""
def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):
Q= Q.T
for stepin xrange(steps):
for iin xrange(len(R)):
for jin xrange(len(R[i])):
if R[i][j]> 0:
eij= R[i][j]- numpy.dot(P[i,:],Q[:,j])
for kin xrange(K):
P[i][k]= P[i][k]+ alpha* (2 * eij* Q[k][j]- beta* P[i][k])
Q[k][j]= Q[k][j]+ alpha* (2 * eij* P[i][k]- beta* Q[k][j])
eR= numpy.dot(P,Q)
e= 0
for iin xrange(len(R)):
for jin xrange(len(R[i])):
if R[i][j]> 0:
e= e+ pow(R[i][j]- numpy.dot(P[i,:],Q[:,j]),2)
for kin xrange(K):
e= e+ (beta/2)* ( pow(P[i][k],2)+ pow(Q[k][j],2))
if e< 0.001:
break
return P, Q.T

###############################################################################

if __name__ == "__main__":
R= [
[5,3,0,1],
[4,0,0,1],
[1,1,0,5],
[1,0,0,4],
[0,1,5,4],
]

R= numpy.array(R)

N= len(R)
M= len(R[0])
K= 2

P= numpy.random.rand(N,K)
Q= numpy.random.rand(M,K)

nP, nQ= matrix_factorization(R, P, Q, K)



References

Therehavebeen quitea lot of references on matrix factorization. Below aresomeof therelated papers.

Gábor Takács et al (2008). Matrix factorization and neighbor
based algorithms for theNetflix prizeproblem. In: Proceedings of the2008 ACMConferenceon Recommender Systems, Lausanne, Switzerland, October 23 - 25, 267-274.

PatrickOtt (2008). Incremental Matrix Factorization for
CollaborativeFiltering. Science, Technology and Design 01/2008, Anhalt University of Applied Sciences.

Daniel D. Leeand H. Sebastian Seung (2001). Algorithms
for Non-negativeMatrix Factorization. Advances in Neural Information Processing Systems 13: Proceedings of the2000 Conference. MIT Press. pp. 556–562.

Daniel D. Leeand H. Sebastian Seung (1999). Learning
theparts of objects by non-negativematrix factorization. Nature, Vol. 401, No. 6755. (21 October 1999), pp. 788-791.

下面贴一张原文章的截图,看的更清楚些。

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息