Hbase-官方Hbase_MapReduce(二)【HdfsToHbase】
目标:将HDFS中表student的数据上传到HBase的student表中。
分步执行:map+reduce
一、HdfsToHaseMapper类
将HDFS中表的数据取出并进行封装转换成Bytes的形式
package HdfsToHbase; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; /** * @Author : ASUS and xinrong * @Version : 2020/4/25 & 1.0 */ public class HdfsToHbaseMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //1.从HDFS中读取数据 String line = value.toString(); //2.读取出来的每行的数据使用,进行分割,存于String数组--注意:这里的分隔符一定要与HDFS中表的数据一致才可。 String[] strings = line.split(","); //3.根据数据中值的含义进行取值 String rowKey = strings[0]; String colName = strings[2]; String val = strings[3]; //4.初始化rowKey,转换成适合Hbase的形式-Byte ImmutableBytesWritable rowKeyWritable = new ImmutableBytesWritable(Bytes.toBytes(rowKey)); //5.初始化Put对象 Put put = new Put(Bytes.toBytes(rowKey)); //6.将要传到Hbase数据库的数据给put //列簇、列名、数值--下面这种写法就是将同一行数据以两种不同形式展现出来,具体效果看结果便知 put.add(Bytes.toBytes("CF2"), Bytes.toBytes("colName"), Bytes.toBytes(colName)); put.add(Bytes.toBytes("CF2"), Bytes.toBytes("val"), Bytes.toBytes(val)); //7.写入 context.write(rowKeyWritable,put); } }
二、HdfsToHbaseReducer类
将数据加载到Hbase指定的表中
package HdfsToHbase; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableReducer; import org.apache.hadoop.io.NullWritable; import java.io.IOException; /** * @Author : ASUS and xinrong * @Version : 2020/4/25 & 1.0 */ public class HdfsToHbaseReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> { @Override protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException { //将读出来的每一行写入到Hbase中指定的表中 for(Put put:values){ context.write(NullWritable.get(),put); } } }
三、HdfsToHbaseRunner类-执行类
package HdfsToHbase; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; /** * @Author : ASUS and xinrong * @Version : 2020/4/25 & 1.0 */ public class HdfsToHbaseRunner extends Configured implements Tool { @Override public int run(String[] strings) throws Exception { //1.得到Configuration Configuration conf = this.getConf(); //2.创建Job任务 Job job = Job.getInstance(conf, this.getClass().getSimpleName()); //3.配置job-运行的类 job.setJarByClass(HdfsToHbaseRunner.class); //4.输入路径-9000是NameNode的地址 Path inPath = new Path("hdfs://bigdata111:9000/user/root/plus/staff_timestamp"); FileInputFormat.addInputPath(job, inPath); //5.配置Mapper job.setMapperClass(HdfsToHbaseMapper.class); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Put.class); //6.配置Reducer-输出表名、执行类、job TableMapReduceUtil.initTableReducerJob("student", HdfsToHbaseReducer.class, job); //7.配置Reducer数量 job.setNumReduceTasks(1); //8.运行状态 boolean status = job.waitForCompletion(true); if(status){ System.out.println("运行成功!"); return 0; }else{ System.out.println("运行失败!"); return 1; } } public static void main(String[] args) throws Exception { Configuration configuration = HBaseConfiguration.create(); int status = ToolRunner.run(configuration, new HdfsToHbaseRunner(), args); System.exit(status); } }
四、执行
1.Maven打包:clear+package
2.上传jar包到:/opt/module/hbase-1.3.1/myjob 目录下
[root@bigdata111 myjob]# ll
total 20
-rw-r--r-- 1 root root 18525 Apr 25 19:50 hbase-1.0-SNAPSHOT.jar
3.运行:
[root@bigdata111 myjob]# yarn jar hbase-1.0-SNAPSHOT.jar HdfsToHbase.HdfsToHbaseRunner
尖叫提示:运行任务前,如果待数据导入的表不存在,则需要提前创建之。
五、结果展示
1.操作之前HDFS中的表、Hbase中的表展示:


注意:存入的列簇在要上传的表中一定要存在,否则会报错:

2.结果:

浙公网安备 33010602011771号