您的位置:首页 > 编程语言 > Java开发

推荐系统之余弦相似度及其Java实现

2014-11-14 13:13 495 查看
我们常常用余弦来度量两个个体的相似度,公式如下:而未知的评分估计值可按照如下公式计算:下面看一个例子:例1设有五个用户U1,U2,U3,U4,U5,对四个物品I1,I2,I3,I4的评分如下表("-"代表用户没有对物品进行打分):
I1I2I3I4
U153-1
U24--1
U311-5
U41--4
U5-154
那么,
若用矩阵存储用户及评分,余弦相似度的计算及预测的Java代码如下:
public class Sim {/*类class Sim {}用于进行余弦相似度的计算,* 利用用户之间的相似度做评分预测,并比较预测值与真实值的误差(对所有用户做平均)*/public double cosine(double[][] matrix, int i,int j){ //cosine measureif(i==j)return 1;int k=0;int ai=0;int aj=0;int aij=0;for(k=0;k<matrix[i].length;k++){if(matrix[i][k]!=0&&matrix[j][k]!=0)aij++;if(matrix[i][k]!=0)ai++;if(matrix[j][k]!=0)aj++; } return aij/(Math.sqrt(ai*aj));}public double r(double[][] matrix,int u,int i){double a=0,b=0;for(int v=0;v<matrix.length;v++){if(u!=v){a=a+cosine(matrix,u,v)*matrix[v][i];b=b+cosine(matrix,u,v);}}return a/b;}public double[][] predict(double[][] matrix){double[][] pred=new double[matrix.length][matrix[0].length];for(int i=0;i<matrix.length;i++){for(int j=0;j<matrix[i].length;j++){pred[i][j]=r(matrix,i,j); }}return pred;}public double comErr(double[][] matrix){ //compute the error of the predicting resultdouble[][] predictMat = new double[matrix.length][matrix[0].length];predictMat=predict(matrix);double meanErr=0;for(int i =0;i<matrix.length;i++){for(int j=0;j<matrix[i].length;j++){meanErr=meanErr+Math.abs(predictMat[i][j]-matrix[i][j]);} }double a=matrix.length;return meanErr/a;}public static void main(String args[]){Sim sim = new Sim();double mat[][]={{5,3,0,1},{4,0,0,1},{1,1,0,5},{1,0,0,4},{0,1,5,4}};for(int i=0;i<mat.length;i++){for(int j=0;j<mat.length;j++){System.out.print(sim.cosine(mat, i, j)+" ");}System.out.println();}System.out.println("以上是相似度矩阵");double[][] pre=sim.predict(mat);for(int i=0;i<mat.length;i++){for(int j=0;j<mat[i].length;j++)System.out.print(pre[i][j]+" ");System.out.println(); }System.out.println("以上是预测结果");System.out.println(sim.comErr(mat));System.out.println("以上是平均误差");//System.out.println(sim.r(mat, 4, 1));}}很多时候,数据是采用稀疏矩阵存储的,因不可能每个用户对每个物品都有使用或评价。此时,需建立稀疏矩阵。对于稀疏矩阵,余弦相似度的计算及预测Java实现如下:import java.io.IOException;import java.util.HashMap;public class Sim1 {public static HashMap<Index, Float> T = new HashMap<Index, Float>();public double cosine(SparseMatrix sm, int i,int j){        //第i行和第j行的相似度if(i==j)return 1;int k=0;int ai=0;int aj=0;int aij=0;for(k=1;k<sm.GetNumOfY();k++){   //第k列if(sm.GetMatrixValue(i,k)!=-1 && sm.GetMatrixValue(j, k) !=-1){//从thj1来看,sm.GetMatrixValue(j, k) !=-1就是有值的aij++; }if(sm.GetMatrixValue(i, k) !=-1){ai++; }if(sm.GetMatrixValue(j, k) !=-1)aj++; }return aij/(Math.sqrt(ai*aj));}public double r(SparseMatrix sm,int u,int i){double a = 0, b = 0;for(int v = 1;v <= sm.GetNumOfX();v++){if(u != v){if(sm.GetMatrixValue(v, i) !=-1)a = a + cosine(sm,u,v) * sm.GetMatrixValue(v, i);b = b + cosine(sm,u,v);}}return a/b;} public double[][] predict(SparseMatrix sm){double[][] pred=new double[sm.GetNumOfX()][sm.GetNumOfY()];for(int i=0;i<sm.GetNumOfY();i++){for(int j=0;j<sm.GetNumOfY();j++){pred[i][j]=r(sm,i+1,j+1);}}return pred;}public double r1(SparseMatrix sm,double[][] mat,int u,int i){double a=0,b=0;for(int v=0;v<sm.GetNumOfX();v++){if(u!=v){if(sm.GetMatrixValue(v, i)!=-1)a=a+mat[u][v]*sm.GetMatrixValue(v, i);b=b+mat[u][v];}}return a/b;}public double[][] predict1(SparseMatrix sm,double[][]mat){double[][] pred=new double[sm.GetNumOfX()][sm.GetNumOfY()];for(int i=0;i<sm.GetNumOfX();i++){for(int j=0;j<sm.GetNumOfY();j++){pred[i][j]=r1(sm,mat,i,j);}}return pred;}public double comErr(SparseMatrix sm,double[][]mat){//compute the error of the predicting resultdouble[][] predictMat = new double[sm.GetNumOfX()][sm.GetNumOfY()];predictMat=predict1(sm,mat);double meanErr=0;for(int i =0;i<sm.GetNumOfX();i++){for(int j=0;j<sm.GetNumOfY();j++){meanErr=meanErr+Math.abs(predictMat[i][j]-sm.GetMatrixValue(i, j));} }double a=sm.GetNumOfX();return meanErr/a;}public static void main(String args[]) throws IOException{Sim1 sim1 = new Sim1();SparseMatrix sm = new SparseMatrix();sm.MakeMatrix("E:\\python\\rating.dat");double t1= System.currentTimeMillis();double[][] simMat = new double[sm.GetNumOfX()][sm.GetNumOfY()];for(int i=1;i<=sm.GetNumOfX();i++){for(int j=1;j<=sm.GetNumOfX();j++){simMat[i-1][j-1] = sim1.cosine(sm, i, j);System.out.print(sim1.cosine(sm, i, j)+"   ");}System.out.println();}System.out.println("以上是相似度矩阵");double t2= System.currentTimeMillis();System.out.println(t2-t1);double[][] pre=sim1.predict1(sm,simMat);for(int i=0;i<sm.GetNumOfX();i++){for(int j=0;j<sm.GetNumOfY();j++)System.out.print(pre[i][j]+"    ");System.out.println();}System.out.println("以上是预测结果");System.out.println(sim1.comErr(sm,simMat));System.out.println("以上是平均误差");}}
                                            
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: