mahout推荐7无偏好值的处理

用户和物品是关联的,但是没有这种关联的强度描述,如用户浏览文章。

无偏好值的内存实现:

  重要是datamodel和modelbuilder的实现。

package mahout;

import java.io.File;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.DataModelBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;
/**
 * 
 * @author Administrator
 *
 */
public class TestRecommenderEvaluator {

	public static void main(String[] args) throws Exception {
		//强制每次生成相同的随机值,生成可重复的结果
		//RandomUtils.useTestSeed();
		//数据装填,无偏好值的处理
		DataModel dataModel = new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("data/ua.base"))));
		
		//推荐评估,使用平均值
		RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
		//推荐评估,使用均方差
		//RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
		//用于生成推荐引擎的构建器,与上一例子实现相同
		RecommenderBuilder builder = new RecommenderBuilder() {
			
			public Recommender buildRecommender(DataModel model) throws TasteException {
				// TODO Auto-generated method stub
				//用户相似度,多种方法
				UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
				//用户邻居
				UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
				//一个推荐器
				return new GenericUserBasedRecommender(model, neighborhood, similarity);
			}
		};
		DataModelBuilder modelBuilder = new DataModelBuilder() {
			
			public DataModel buildDataModel(FastByIDMap<PreferenceArray> arg0) {
				// TODO Auto-generated method stub
				return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(arg0));
			}
		};
		//推荐程序评估值(平均差值)训练90%的数据,测试数据10%,《mahout in Action》使用的是0.7,但是出现结果为NaN
		double score = evaluator.evaluate(builder, modelBuilder, dataModel, 0.9, 1.0);
		System.out.println(score);
	}
}

 结果:

14/08/04 11:33:26 INFO file.FileDataModel: Creating FileDataModel for file data\ua.base
14/08/04 11:33:26 INFO file.FileDataModel: Reading file info...
14/08/04 11:33:27 INFO file.FileDataModel: Read lines: 90570
14/08/04 11:33:27 INFO file.FileDataModel: Reading file info...
14/08/04 11:33:27 INFO file.FileDataModel: Read lines: 0
14/08/04 11:33:27 INFO model.GenericDataModel: Processed 943 users
14/08/04 11:33:27 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GenericBooleanPrefDataModel[users:1,2,3...]
Exception in thread "main" java.lang.IllegalArgumentException: DataModel doesn't have preference values
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
	at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:74)
	at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:66)
	at mahout.TestRecommenderEvaluator$1.buildRecommender(TestRecommenderEvaluator.java:45)
	at org.apache.mahout.cf.taste.impl.eval.AbstractDifferenceRecommenderEvaluator.evaluate(AbstractDifferenceRecommenderEvaluator.java:125)
	at mahout.TestRecommenderEvaluator.main(TestRecommenderEvaluator.java:60)

 一个异常,不合适的参数,datamodel没有偏好值,我们用的不就是无偏好值的嘛?为何还需要偏好值呢???

PearsonCorrelationSimilarity.用户相似度度量,如果缺少偏好值,像欧式距离拒绝工作,或皮尔孙相关系数是未定义的,所以这两个计算用户相似度需要依赖偏好值,就是说我们选错了相似度度量方法,将其改为LogLikelihoodSimilarity替换PearsonCorrelationSimilarity

结果:

14/08/04 11:40:38 INFO file.FileDataModel: Creating FileDataModel for file data\ua.base
14/08/04 11:40:38 INFO file.FileDataModel: Reading file info...
14/08/04 11:40:39 INFO file.FileDataModel: Read lines: 90570
14/08/04 11:40:39 INFO file.FileDataModel: Reading file info...
14/08/04 11:40:39 INFO file.FileDataModel: Read lines: 0
14/08/04 11:40:39 INFO model.GenericDataModel: Processed 943 users
14/08/04 11:40:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GenericBooleanPrefDataModel[users:1,2,3...]
14/08/04 11:40:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 882 users
14/08/04 11:40:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 882 tasks in 4 threads
14/08/04 11:40:39 INFO eval.StatsCallable: Average time per recommendation: 70ms
14/08/04 11:40:39 INFO eval.StatsCallable: Approximate memory used: 25MB / 112MB
14/08/04 11:40:39 INFO eval.StatsCallable: Unable to recommend in 30 cases
14/08/04 11:40:47 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.0
0.0

 结果是0,完全匹配,是不是好的过火了!!!

很遗憾的确如此,这个是当每个偏好值为1时,估计偏好和实际偏好之间的平均差值,结果自然是0,这个测试无效的,因为他只能输出0

.......................

但是查准率和查全率是有效的,见下一篇文章

 

posted @ 2014-08-04 11:45  jseven  阅读(620)  评论(0编辑  收藏  举报