mahout推荐8-利用布尔型数据评估查准率和查全率
直接上代码吧:
package mahout; import java.io.File; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.eval.DataModelBuilder; import org.apache.mahout.cf.taste.eval.IRStatistics; import org.apache.mahout.cf.taste.eval.RecommenderBuilder; import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator; import org.apache.mahout.cf.taste.impl.common.FastByIDMap; import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator; import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood; import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender; import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.model.PreferenceArray; import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; import org.apache.mahout.cf.taste.recommender.Recommender; import org.apache.mahout.cf.taste.similarity.UserSimilarity; public class IRSBoolean { public static void main(String[] args) throws Exception { //无偏好值的datamodel DataModel dataModel = new GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap(new FileDataModel( new File("data/ua.base")))); //评估器 RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); //推荐引擎构造器,需要构造和实际使用一样的 RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { public Recommender buildRecommender(DataModel model) throws TasteException { // TODO Auto-generated method stub //用户相似度,采用Log,而不是Pearson UserSimilarity userSimilarity = new LogLikelihoodSimilarity( model); //用户邻居 UserNeighborhood userNeighborhood = new NearestNUserNeighborhood( 10, userSimilarity, model); return new GenericUserBasedRecommender(model, userNeighborhood, userSimilarity); //return new GenericBooleanPrefUserBasedRecommender(model,userNeighborhood,userSimilarity); } }; //数据模型构造器 DataModelBuilder modelBuilder = new DataModelBuilder() { public DataModel buildDataModel(FastByIDMap<PreferenceArray> map) { // TODO Auto-generated method stub return new GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap(map)); } }; //评估标准 IRStatistics stats = evaluator.evaluate(recommenderBuilder, modelBuilder, dataModel, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); System.out.println("查准率:" + stats.getPrecision()); System.out.println("查全率:" + stats.getRecall()); } }
所得查准率和查全率
输出结果(有许多打印输出的):
.................... 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 942 in 31ms 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.2549125168236878 / 0.2549125168236878 / 0.004461601695666552 / 0.24390219904521424 / 1.0 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 943 in 31ms 14/08/04 12:32:28 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.25497311827957 / 0.25497311827957 / 0.004461238812697198 / 0.2439398499255423 / 1.0 查准率:0.25497311827957 查全率:0.25497311827957
书中所查大约为24.7%,有点不一致哎。
换一个推荐程序:
//return new GenericBooleanPrefUserBasedRecommender(model,userNeighborhood,userSimilarity);
将他打开,看看结果如何:
................................. 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 942 in 31ms 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.17321668909825047 / 0.17321668909825047 / 0.004950798268872743 / 0.1803236393639469 / 1.0 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Evaluated with user 943 in 32ms 14/08/04 12:44:50 INFO eval.GenericRecommenderIRStatsEvaluator: Precision/recall/fall-out/nDCG/reach: 0.1731182795698926 / 0.1731182795698926 / 0.004951387547485665 / 0.1801745904157921 / 1.0 查准率:0.1731182795698926 查全率:0.1731182795698926
书中为22.9%,为何我的都要小呢。难道数据集发生了变化。.....................
类似还有其他datamodel的布尔型变种,如MySQLBooleanPrefDataModel