「hadoop」win7 idea maven hadoop 运行WordCount示例
运行一个简单的hadoop实例,已自测成功。
假设已安装如下环境:
1、win7跑三台ubuntu虚拟机,虚拟机已成功安装hadoop2.8.1环境;
2、win7安装idea工具 idea2017;
3、win7安装hadoop2.8.1环境,并已配置相关的环境变量;
4、拷贝windows用的已编译好的hadoop.dll和winutils.exe,务必注意一定要是2.8.1版本的, 参考 https://github.com/steveloughran/winutils
【步骤】
1、参考 http://blog.csdn.net/u011654631/article/details/70037219,该地址简称 参考页;
2、idea创建maven的java工程;
3、按参考页pom.xml中集成相应的hadoop jar包;(有hadoop-mapreduce-client-core,hadoop-hdfs,hadoop-mapreduce-client-jobclient(务必去掉provideed控制),hadoop-mapreduce-client-common,hadoop-common。
4、最后通过$hdfs dfs -cat /test/out/part-r-00000查看统计结果。
WordCount代码
 
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import java.io.IOException; public class WordCount extends Configured implements Tool { public int run(String[] strings) throws Exception { try { System.setProperty("hadoop.home.dir", "C:\\LearnTool\\hadoop"); System.setProperty("HADOOP_USER_NAME", "chendajian"); Configuration conf = getConf(); conf.set("mapreduce.job.jar", "C:\\Workspace\\javaweb\\hadoop\\out\\artifacts\\hadoop_jar\\hadoop.jar"); // conf.set("yarn.resourcemanager.hostname", "10.0.10.231"); conf.set("mapreduce.app-submission.cross-platform", "true"); Job job = Job.getInstance(conf); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); job.setMapperClass(WcMapper.class); job.setReducerClass(WcReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); // 清空out FileSystem fs = FileSystem.get(conf); String out = "hdfs://10.0.10.231:9000/test/out"; Path outPath = new Path(out); if (fs.exists(outPath)) { fs.delete(outPath, true); } FileInputFormat.setInputPaths(job, "hdfs://master:9000/test/testvim.txt"); FileOutputFormat.setOutputPath(job, new Path(out)); job.waitForCompletion(true); } catch (Exception e) { e.printStackTrace(); } return 0; } public static class WcMapper extends Mapper<LongWritable, Text, Text, LongWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String mVal = value.toString(); context.write(new Text(mVal), new LongWritable(1)); } } public static class WcReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long sum = 0; for (LongWritable lVal : values) { sum += lVal.get(); } context.write(key, new LongWritable(sum)); } } public static void main(String[] args) throws Exception { ToolRunner.run(new WordCount(), args); } }
几点补充:
1、把core-site.xml,mapred-site.xml,yarn-site.xml等拷到工程的resources目录下;
2、如遇到 hdfs:master:9000 访问refused,用IP地址替换master试试;
3、input文件位于hdfs系统内,linux只能通过hdfs dfs方式访问;
4、2.8.1版本的hadoop.dll和winutils.exe需另行下载, 参考 https://github.com/steveloughran/winutils;
5、用户权限问题,win7增加环境变量 HADOOP_USER_NAME, 值为 hadoop的用户名;
6、增加日志打印配置文件log4j.xml,放到工程的resources目录下,xml内容参考 http://www.cnblogs.com/ftrako/p/7570094.html ;
7、pom.xml中的hadoop-mapreduce-client-jobclient依赖中去掉provide控制,会导致不会使用YARN模式,而使用local模式;
 
                     
                    
                 
                    
                 
 
                
            
         
 
         浙公网安备 33010602011771号
浙公网安备 33010602011771号