mahout推荐10-尝试GroupLens数据集

数据集下载地址:http://grouplens.org/datasets/movielens/ 之前用的是100K的,现在需要下载MovieLens 10M,使用里面的ratings.dat

前提:因为文件不符合mahout要求的文件输入格式,需要进行转换,但是example里提供了一个解析这个文件的类GrouplensDataModel,所以直接用了。

package mahout;

import java.io.File;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.eval.LoadEvaluator;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.cf.taste.similarity.precompute.example.GroupLensDataModel;

public class GroupLensDataModelTest {
 
	public static void main(String[] args) throws Exception {
		//使用定制的GrouplensDataModel,如果没有转换数据集成为csv格式的
		DataModel dataModel = new GroupLensDataModel(new File(
				"data/ratings.dat"));
		//皮尔逊相关系数,衡量用户相似度
		UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(
				dataModel);
		//构建用户邻居,100个
		UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(100,
				userSimilarity, dataModel);
		//推荐引擎
		Recommender recommender = new GenericUserBasedRecommender(dataModel,
				userNeighborhood, userSimilarity);
		//运行
		LoadEvaluator.runLoad(recommender);
	}
}

 运行试试,如果你的内存足够大的话。

输出结果:

我的文件还没有下载下来呢!!!!!!!!!!

补上:

输出结果:

14/08/05 10:05:13 INFO file.FileDataModel: Creating FileDataModel for file C:\Users\ADMINI~1\AppData\Local\Temp\ratings.txt
14/08/05 10:05:17 INFO file.FileDataModel: Reading file info...
14/08/05 10:05:18 INFO file.FileDataModel: Processed 1000000 lines
14/08/05 10:05:19 INFO file.FileDataModel: Processed 2000000 lines
14/08/05 10:05:20 INFO file.FileDataModel: Processed 3000000 lines
14/08/05 10:05:21 INFO file.FileDataModel: Processed 4000000 lines
14/08/05 10:05:23 INFO file.FileDataModel: Processed 5000000 lines
14/08/05 10:05:24 INFO file.FileDataModel: Processed 6000000 lines
14/08/05 10:05:25 INFO file.FileDataModel: Processed 7000000 lines
14/08/05 10:05:26 INFO file.FileDataModel: Processed 8000000 lines
14/08/05 10:05:27 INFO file.FileDataModel: Processed 9000000 lines
14/08/05 10:05:30 INFO file.FileDataModel: Processed 10000000 lines
14/08/05 10:05:30 INFO file.FileDataModel: Read lines: 10000054
14/08/05 10:05:31 INFO model.GenericDataModel: Processed 10000 users
14/08/05 10:05:31 INFO model.GenericDataModel: Processed 20000 users
14/08/05 10:05:33 INFO model.GenericDataModel: Processed 30000 users
14/08/05 10:05:33 INFO model.GenericDataModel: Processed 40000 users
14/08/05 10:05:34 INFO model.GenericDataModel: Processed 50000 users
14/08/05 10:05:34 INFO model.GenericDataModel: Processed 60000 users
14/08/05 10:05:35 INFO model.GenericDataModel: Processed 69878 users
14/08/05 10:05:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 982 tasks in 4 threads
14/08/05 10:05:39 INFO eval.StatsCallable: Average time per recommendation: 163ms
14/08/05 10:05:39 INFO eval.StatsCallable: Approximate memory used: 445MB / 815MB
14/08/05 10:05:39 INFO eval.StatsCallable: Unable to recommend in 0 cases

 没有输出结果:

在代码最后增加这么几行代码测试:

//增加推荐:
		//为用户1推荐10件物品1,10
		List<RecommendedItem> recommendedItems = recommender.recommend(1, 10);
		//输出
		for (RecommendedItem item : recommendedItems) {
			System.out.println(item);
		}

 查看输出结果:还是没有结果,怪了,后期再搞搞。

14/08/05 10:09:48 INFO file.FileDataModel: Creating FileDataModel for file C:\Users\ADMINI~1\AppData\Local\Temp\ratings.txt
14/08/05 10:09:48 INFO file.FileDataModel: Reading file info...
14/08/05 10:09:49 INFO file.FileDataModel: Processed 1000000 lines
14/08/05 10:09:50 INFO file.FileDataModel: Processed 2000000 lines
14/08/05 10:09:52 INFO file.FileDataModel: Processed 3000000 lines
14/08/05 10:09:52 INFO file.FileDataModel: Processed 4000000 lines
14/08/05 10:09:54 INFO file.FileDataModel: Processed 5000000 lines
14/08/05 10:09:56 INFO file.FileDataModel: Processed 6000000 lines
14/08/05 10:09:56 INFO file.FileDataModel: Processed 7000000 lines
14/08/05 10:09:57 INFO file.FileDataModel: Processed 8000000 lines
14/08/05 10:09:58 INFO file.FileDataModel: Processed 9000000 lines
14/08/05 10:10:00 INFO file.FileDataModel: Processed 10000000 lines
14/08/05 10:10:00 INFO file.FileDataModel: Read lines: 10000054
14/08/05 10:10:01 INFO model.GenericDataModel: Processed 10000 users
14/08/05 10:10:01 INFO model.GenericDataModel: Processed 20000 users
14/08/05 10:10:02 INFO model.GenericDataModel: Processed 30000 users
14/08/05 10:10:02 INFO model.GenericDataModel: Processed 40000 users
14/08/05 10:10:02 INFO model.GenericDataModel: Processed 50000 users
14/08/05 10:10:03 INFO model.GenericDataModel: Processed 60000 users
14/08/05 10:10:06 INFO model.GenericDataModel: Processed 69878 users
14/08/05 10:10:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 985 tasks in 4 threads
14/08/05 10:10:08 INFO eval.StatsCallable: Average time per recommendation: 116ms
14/08/05 10:10:08 INFO eval.StatsCallable: Approximate memory used: 578MB / 795MB
14/08/05 10:10:08 INFO eval.StatsCallable: Unable to recommend in 0 cases

 

posted @ 2014-08-04 13:17  jseven  阅读(1059)  评论(0编辑  收藏  举报