mahout 决策树算法实验

http://blog.sina.com.cn/s/blog_61c463090100pbdh.html

导入到hdfs
bin/hadoop fs -put ./KDDTrain+.arff /user/root/
bin/hadoop fs -put ./KDDTest+.arff /user/root/
生成格式数据
bin/hadoop jar mahout-0.4.jar org.apache.mahout.df.tools.Describe -p "/user/root/KDDTrain+.arff" -f /user/root/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L
训练数据
bin/hadoop jar  mahout-0.4.jar org.apache.mahout.df.mapreduce.BuildForest  -oob -d /user/root/KDDTrain+.arff -ds /user/root/KDDTrain+.info -sl 5 -p -t 5 -o forest_result
测试数据
bin/hadoop jar  mahout-0.4.jar org.apache.mahout.df.mapreduce.TestForest -i  /user/root/KDDTrain+.arff  -ds  /user/root/KDDTrain+.info -m forest_result -a -o predictions


[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop jar  mahout-0.4.jar org.apache.mahout.df.mapreduce.BuildForest  -oob -d /user/root/KDDTrain+.arff -ds /user/root/KDDTrain+.info -sl 5 -p -t 5 -o forest_result
11/03/04 22:01:41 INFO mapreduce.BuildForest: Partial Mapred implementation
11/03/04 22:01:41 INFO mapreduce.BuildForest: Building the forest...
11/03/04 22:01:42 INFO mapred.FileInputFormat: Total input paths to process : 1
11/03/04 22:01:43 INFO mapred.JobClient: Running job: job_201103042138_0001
11/03/04 22:01:45 INFO mapred.JobClient:  map 0% reduce 0%
11/03/04 22:02:04 INFO mapred.JobClient:  map 50% reduce 0%
11/03/04 22:02:09 INFO mapred.JobClient:  map 100% reduce 0%
11/03/04 22:02:10 INFO mapred.JobClient: Job complete: job_201103042138_0001
11/03/04 22:02:10 INFO mapred.JobClient: Counters: 7
11/03/04 22:02:10 INFO mapred.JobClient:   File Systems
11/03/04 22:02:10 INFO mapred.JobClient:     HDFS bytes read=18745670
11/03/04 22:02:10 INFO mapred.JobClient:     HDFS bytes written=1312106
11/03/04 22:02:10 INFO mapred.JobClient:   Job Counters
11/03/04 22:02:10 INFO mapred.JobClient:     Launched map tasks=2
11/03/04 22:02:10 INFO mapred.JobClient:     Data-local map tasks=2
11/03/04 22:02:10 INFO mapred.JobClient:   Map-Reduce Framework
11/03/04 22:02:10 INFO mapred.JobClient:     Map input records=125973
11/03/04 22:02:10 INFO mapred.JobClient:     Map input bytes=18742306
11/03/04 22:02:10 INFO mapred.JobClient:     Map output records=5
11/03/04 22:02:10 INFO partial.PartialBuilder: Computing partitions' first ids...
11/03/04 22:02:11 INFO mapred.FileInputFormat: Total input paths to process : 1
11/03/04 22:02:12 INFO mapred.JobClient: Running job: job_201103042138_0002
11/03/04 22:02:13 INFO mapred.JobClient:  map 0% reduce 0%
11/03/04 22:02:21 INFO mapred.JobClient:  map 50% reduce 0%
11/03/04 22:02:22 INFO mapred.JobClient:  map 100% reduce 0%
11/03/04 22:02:24 INFO mapred.JobClient: Job complete: job_201103042138_0002
11/03/04 22:02:24 INFO mapred.JobClient: Counters: 7
11/03/04 22:02:24 INFO mapred.JobClient:   File Systems
11/03/04 22:02:24 INFO mapred.JobClient:     HDFS bytes read=18742802
11/03/04 22:02:24 INFO mapred.JobClient:     HDFS bytes written=286
11/03/04 22:02:24 INFO mapred.JobClient:   Job Counters
11/03/04 22:02:24 INFO mapred.JobClient:     Launched map tasks=2
11/03/04 22:02:24 INFO mapred.JobClient:     Data-local map tasks=2
11/03/04 22:02:24 INFO mapred.JobClient:   Map-Reduce Framework
11/03/04 22:02:24 INFO mapred.JobClient:     Map input records=125973
11/03/04 22:02:24 INFO mapred.JobClient:     Map input bytes=18742306
11/03/04 22:02:24 INFO mapred.JobClient:     Map output records=2
11/03/04 22:02:24 INFO partial.Step0Job: mapred.map.tasks = 2
11/03/04 22:02:24 INFO partial.PartialBuilder: Processing the output...
11/03/04 22:02:24 INFO partial.PartialBuilder: *****************************
11/03/04 22:02:24 INFO partial.PartialBuilder: Second Step
11/03/04 22:02:24 INFO partial.PartialBuilder: *****************************
11/03/04 22:02:25 INFO mapred.FileInputFormat: Total input paths to process : 1
11/03/04 22:02:26 INFO mapred.JobClient: Running job: job_201103042138_0003
11/03/04 22:02:27 INFO mapred.JobClient:  map 0% reduce 0%
11/03/04 22:02:36 INFO mapred.JobClient:  map 50% reduce 0%
11/03/04 22:02:38 INFO mapred.JobClient:  map 100% reduce 0%
11/03/04 22:02:40 INFO mapred.JobClient: Job complete: job_201103042138_0003
11/03/04 22:02:40 INFO mapred.JobClient: Counters: 7
11/03/04 22:02:40 INFO mapred.JobClient:   File Systems
11/03/04 22:02:40 INFO mapred.JobClient:     HDFS bytes read=18849734
11/03/04 22:02:40 INFO mapred.JobClient:     HDFS bytes written=1260142
11/03/04 22:02:40 INFO mapred.JobClient:   Job Counters
11/03/04 22:02:40 INFO mapred.JobClient:     Launched map tasks=2
11/03/04 22:02:40 INFO mapred.JobClient:     Data-local map tasks=2
11/03/04 22:02:40 INFO mapred.JobClient:   Map-Reduce Framework
11/03/04 22:02:40 INFO mapred.JobClient:     Map input records=125973
11/03/04 22:02:40 INFO mapred.JobClient:     Map input bytes=18742306
11/03/04 22:02:40 INFO mapred.JobClient:     Map output records=5
11/03/04 22:02:40 INFO common.HadoopUtil: Deleting hdfs://localhost:9000/user/root/forest_result
11/03/04 22:02:40 INFO mapreduce.BuildForest: Build Time: 0h 0m 59s 435
11/03/04 22:02:42 INFO mapreduce.BuildForest: oob error estimate : 0.0026276097483527825
11/03/04 22:02:42 INFO mapreduce.BuildForest: Storing the forest in: forest_result/forest.seq

[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop jar  mahout19-0.4-alpha.jar org.apache.mahout.df.mapreduce.TestForest -i  /user/root/KDDTrain+.arff  -ds  /user/root/KDDTrain+.info -m forest_result -a -o predictions
11/03/04 22:07:49 INFO mapreduce.TestForest: Loading the forest...
11/03/04 22:07:50 INFO mapreduce.TestForest: Sequential classification...
11/03/04 22:07:54 INFO mapreduce.TestForest: Classification Time: 0h 0m 4s 164
11/03/04 22:07:54 INFO mapreduce.TestForest: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :     125862       99.9119%
Incorrectly Classified Instances        :        111        0.0881%
Total Classified Instances              :     125973

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       <--Classified as
67307   36       |  67343       a     = normal
75      58555    |  58630       b     = anomaly
Default Category: unknown: 2

posted @ 2012-12-03 19:36  subsir  阅读(504)  评论(0编辑  收藏  举报