Mahout实战---评估推荐程序

推荐程序的一般评测标准有MAE(平均绝对误差),Precision(查准率),recall(查全率)

针对Mahout实战---运行第一个推荐引擎 的推荐程序,将使用上面三个标准分别测量

MAE(平均绝对误差)

MAE表示预测评分与真实评分之间的绝对变差的平均值。其中N表示训练集中的评分总数。

mahout中已经实现了:org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator

具体java代码如下:

package com.xxx;

import java.io.File;
import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

/**
 * 对推荐程序进行评价:使用平均绝对误差MAE
 * 
 * @author 
 *
 */
public class RecommenderEvaluatorTest {
    public static void main(String[] args) throws IOException, TasteException {
        String projectDir = System.getProperty("user.dir");
        RandomUtils.useTestSeed();// 生成可重复的结果
        DataModel model = new FileDataModel(new File(projectDir + "/src/main/intro.csv"));

        //
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        RecommenderBuilder builder = new RecommenderBuilder() {

            public Recommender buildRecommender(DataModel model) throws TasteException {
                // TODO Auto-generated method stub
                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

                Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

                return recommender;
            }
        };
        double score = evaluator.evaluate(builder, null, model, 0.9, 1.0);
        System.out.println(score);
    }
}
这里一开始遇到了一个问题:当evaluate()函数的第四个参数(表示训练集合占总数据集合的比例)比较的小时(Mahout实战这本书上写的是0.7,当时的运行结果是NaN,开始时比较郁闷)

解决:参考这篇博客http://blog.csdn.net/tangtang5156/article/details/41210407,原来训练集比例太小导致有些case无法被推荐。如下图的log

最终选择了0.9,也即是90%的数据量作为训练集,10%的数据量作为测试集

最终结果如下:可以看到推荐的偏差为1.0

 

posted @ 2016-12-22 15:16  博学善思。。ljd  阅读(1242)  评论(0编辑  收藏  举报