从源代码剖析Mahout推荐引擎
2013-12-08 01:47
363 查看
从源代码剖析Mahout推荐引擎
Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的项目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘。开源界及厂商,所有数据软件,无一不向Hadoop靠拢。Hadoop也从小众的高富帅领域,变成了大数据开发的标准。在Hadoop原有技术基础之上,出现了Hadoop家族产品,通过“大数据”概念不断创新,推出科技进步。作为IT界的开发人员,我们也要跟上节奏,抓住机遇,跟着Hadoop一起雄起!关于作者:张丹(Conan), 程序员Java,R,PHP,Javascriptweibo:@Conan_Zblog: http://blog.fens.meemail: bsspirit@gmail.com转载请注明出处:http://blog.fens.me/mahout-recommend-engine/前言Mahout框架中cf.taste包实现了推荐算法引擎,它提供了一套完整的推荐算法工具集,同时规范了数据结构,并标准化了程序开发过程。应用推荐算法时,代码也就7-8行,简单地有点像R了。为了使用简单的目标,Mahout推荐引擎必然要做到精巧的程序设计。本文将介绍Mahout推荐引擎的程序设计。目录Mahout推荐引擎概况标准化的程序开发过程数据模型相似度算法工具集近邻算法工具集推荐算法工具集创建自己的推荐引擎构造器1. Mahout推荐引擎概况
Mahout的推荐引擎,要从org.apache.mahout.cf.taste包说起。![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-core-class.png)
2. 标准化的程序开发过程
以UserCF的推荐算法为例,官方建议我们的开发过程:![](http://blog.fens.me/wp-content/uploads/2013/10/mahout_recommendation-process.png)
public class UserCF { final static int NEIGHBORHOOD_NUM = 2; final static int RECOMMENDER_NUM = 3; public static void main(String[] args) throws IOException, TasteException { String file = "datafile/item.csv"; DataModel model = new FileDataModel(new File(file)); UserSimilarity user = new EuclideanDistanceSimilarity(model); NearestNUserNeighborhood neighbor = new NearestNUserNeighborhood(NEIGHBORHOOD_NUM, user, model); Recommender r = new GenericUserBasedRecommender(model, neighbor, user); LongPrimitiveIterator iter = model.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List list = r.recommend(uid, RECOMMENDER_NUM); System.out.printf("uid:%s", uid); for (RecommendedItem ritem : list){ System.out.printf("(%s,%f)", ritem.getItemID(), ritem.getValue()); } System.out.println(); } } }我们调用算法的程序,要用到4个对象:DataModel, UserSimilarity, NearestNUserNeighborhood, Recommender。
3. 数据模型
Mahout的推荐引擎的数据模型,以DataModel接口为父类。![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-datamodel.png)
![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-pref.png)
4. 相似度算法工具集
相似度算法分为2种基于用户(UserCF)的相似度算法基于物品(ItemCF)的相似度算法1). 基于用户(UserCF)的相似度算法![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-UserSimilarity.png)
![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-ItemSimilarity.png)
![](http://blog.fens.me/wp-content/uploads/2013/10/image003.gif)
![](http://blog.fens.me/wp-content/uploads/2013/10/image004.gif)
![](http://blog.fens.me/wp-content/uploads/2013/10/image005.gif)
![](http://blog.fens.me/wp-content/uploads/2013/10/image006.gif)
5. 近邻算法工具集
近邻算法只对于UserCF适用,通过近邻算法给相似的用户进行排序,选出前N个最相似的,作为最终推荐的参考的用户。![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-UserNeighborhood.png)
![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-Neighborhood.png)
6. 推荐算法工具集
推荐算法是以Recommender作为基础的父类,关于推荐算法的详细介绍,请参考文章:Mahout推荐算法API详解![](http://blog.fens.me/wp-content/uploads/2013/10/mahout-Recommender.png)
7. 创建自己的推荐引擎构造器
有了上面的知识,我就清楚地知道了Mahout推荐引擎的原理和使用,我们就可以写一个自己的构造器,通过“策略模式”实现,算法的组合。新建文件:org.conan.mymahout.recommendation.job.RecommendFactory.javapublic final class RecommendFactory {...}1). 构造数据模型
public static DataModel buildDataModel(String file) throws TasteException, IOException {return new FileDataModel(new File(file));}public static DataModel buildDataModelNoPref(String file) throws TasteException, IOException {return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File(file))));}public static DataModelBuilder buildDataModelNoPrefBuilder() {return new DataModelBuilder() {@Overridepublic DataModel buildDataModel(FastByIDMap trainingData) {return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));}};}2). 构造相似度算法模型
public enum SIMILARITY {PEARSON, EUCLIDEAN, COSINE, TANIMOTO, LOGLIKELIHOOD, FARTHEST_NEIGHBOR_CLUSTER, NEAREST_NEIGHBOR_CLUSTER}public static UserSimilarity userSimilarity(SIMILARITY type, DataModel m) throws TasteException {switch (type) {case PEARSON:return new PearsonCorrelationSimilarity(m);case COSINE:return new UncenteredCosineSimilarity(m);case TANIMOTO:return new TanimotoCoefficientSimilarity(m);case LOGLIKELIHOOD:return new LogLikelihoodSimilarity(m);case EUCLIDEAN:default:return new EuclideanDistanceSimilarity(m);}}public static ItemSimilarity itemSimilarity(SIMILARITY type, DataModel m) throws TasteException {switch (type) {case LOGLIKELIHOOD:return new LogLikelihoodSimilarity(m);case TANIMOTO:default:return new TanimotoCoefficientSimilarity(m);}}public static ClusterSimilarity clusterSimilarity(SIMILARITY type, UserSimilarity us) throws TasteException {switch (type) {case NEAREST_NEIGHBOR_CLUSTER:return new NearestNeighborClusterSimilarity(us);case FARTHEST_NEIGHBOR_CLUSTER:default:return new FarthestNeighborClusterSimilarity(us);}}3). 构造近邻算法模型
public enum NEIGHBORHOOD {NEAREST, THRESHOLD}public static UserNeighborhood userNeighborhood(NEIGHBORHOOD type, UserSimilarity s, DataModel m, double num) throws TasteException {switch (type) {case NEAREST:return new NearestNUserNeighborhood((int) num, s, m);case THRESHOLD:default:return new ThresholdUserNeighborhood(num, s, m);}}4). 构造推荐算法模型
public enum RECOMMENDER {USER, ITEM}public static RecommenderBuilder userRecommender(final UserSimilarity us, final UserNeighborhood un, boolean pref) throws TasteException {return pref ? new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel model) throws TasteException {return new GenericUserBasedRecommender(model, un, us);}} : new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel model) throws TasteException {return new GenericBooleanPrefUserBasedRecommender(model, un, us);}};}public static RecommenderBuilder itemRecommender(final ItemSimilarity is, boolean pref) throws TasteException {return pref ? new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel model) throws TasteException {return new GenericItemBasedRecommender(model, is);}} : new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel model) throws TasteException {return new GenericBooleanPrefItemBasedRecommender(model, is);}};}public static RecommenderBuilder slopeOneRecommender() throws TasteException {return new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel dataModel) throws TasteException {return new SlopeOneRecommender(dataModel);}};}public static RecommenderBuilder itemKNNRecommender(final ItemSimilarity is, final Optimizer op, final int n) throws TasteException {return new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel dataModel) throws TasteException {return new KnnItemBasedRecommender(dataModel, is, op, n);}};}public static RecommenderBuilder svdRecommender(final Factorizer factorizer) throws TasteException {return new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel dataModel) throws TasteException {return new SVDRecommender(dataModel, factorizer);}};}public static RecommenderBuilder treeClusterRecommender(final ClusterSimilarity cs, final int n) throws TasteException {return new RecommenderBuilder() {@Overridepublic Recommender buildRecommender(DataModel dataModel) throws TasteException {return new TreeClusteringRecommender(dataModel, cs, n);}};}5). 构造算法评估模型
public enum EVALUATOR {AVERAGE_ABSOLUTE_DIFFERENCE, RMS}public static RecommenderEvaluator buildEvaluator(EVALUATOR type) {switch (type) {case RMS:return new RMSRecommenderEvaluator();case AVERAGE_ABSOLUTE_DIFFERENCE:default:return new AverageAbsoluteDifferenceRecommenderEvaluator();}}public static void evaluate(EVALUATOR type, RecommenderBuilder rb, DataModelBuilder mb, DataModel dm, double trainPt) throws TasteException {System.out.printf("%s Evaluater Score:%s\n", type.toString(), buildEvaluator(type).evaluate(rb, mb, dm, trainPt, 1.0));}public static void evaluate(RecommenderEvaluator re, RecommenderBuilder rb, DataModelBuilder mb, DataModel dm, double trainPt) throws TasteException {System.out.printf("Evaluater Score:%s\n", re.evaluate(rb, mb, dm, trainPt, 1.0));}/*** statsEvaluator*/public static void statsEvaluator(RecommenderBuilder rb, DataModelBuilder mb, DataModel m, int topn) throws TasteException {RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();IRStatistics stats = evaluator.evaluate(rb, mb, m, null, topn, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);// System.out.printf("Recommender IR Evaluator: %s\n", stats);System.out.printf("Recommender IR Evaluator: [Precision:%s,Recall:%s]\n", stats.getPrecision(), stats.getRecall());}6). 推荐结果输出
public static void showItems(long uid, List recommendations, boolean skip) {if (!skip || recommendations.size() > 0) {System.out.printf("uid:%s,", uid);for (RecommendedItem recommendation : recommendations) {System.out.printf("(%s,%f)", recommendation.getItemID(), recommendation.getValue());}System.out.println();}}7). 完整源代码文件及使用样例:https://github.com/bsspirit/maven_mahout_template/tree/mahout-0.8/src/main/java/org/conan/mymahout/recommendation/job转载请注明出处:http://blog.fens.me/mahout-recommend-engine/
相关文章推荐
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 转】从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- 从源代码剖析Mahout推荐引擎
- mahout入门实例-基于 Apache Mahout 构建社会化推荐引擎-实战(参考IBM)
- 利用nutch-1.2和Lucene 搭建自己的搜索平台, Apache Mahout 构建社会化推荐引擎
- 转】用Mahout构建职位推荐引擎
- 国外已经完成的使用mahout的推荐引擎
- mahout 推荐引擎的相关介绍,理解,如何应用。(1)
- mahout推荐引擎源码分析