Hbase-官方Hbase_MapReduce(二)【HdfsToHbase】

Posted on 2020-04-25 22:29  MissRong  阅读(139)  评论(0)    收藏  举报

Hbase-官方Hbase_MapReduce(二)【HdfsToHbase】

目标:将HDFS中表student的数据上传到HBase的student表中。

分步执行:map+reduce

一、HdfsToHaseMapper类

将HDFS中表的数据取出并进行封装转换成Bytes的形式

package HdfsToHbase;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * @Author : ASUS and xinrong
 * @Version : 2020/4/25 & 1.0
 */
public class HdfsToHbaseMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //1.从HDFS中读取数据
        String line = value.toString();
        //2.读取出来的每行的数据使用,进行分割,存于String数组--注意:这里的分隔符一定要与HDFS中表的数据一致才可。
        String[] strings = line.split(",");
        //3.根据数据中值的含义进行取值
        String rowKey = strings[0];
        String colName = strings[2];
        String val = strings[3];

        //4.初始化rowKey,转换成适合Hbase的形式-Byte
        ImmutableBytesWritable rowKeyWritable = new ImmutableBytesWritable(Bytes.toBytes(rowKey));
        //5.初始化Put对象
        Put put = new Put(Bytes.toBytes(rowKey));
        //6.将要传到Hbase数据库的数据给put
        //列簇、列名、数值--下面这种写法就是将同一行数据以两种不同形式展现出来,具体效果看结果便知
        put.add(Bytes.toBytes("CF2"), Bytes.toBytes("colName"), Bytes.toBytes(colName));
        put.add(Bytes.toBytes("CF2"), Bytes.toBytes("val"), Bytes.toBytes(val));
        //7.写入
        context.write(rowKeyWritable,put);
    }
}

二、HdfsToHbaseReducer类

将数据加载到Hbase指定的表中

package HdfsToHbase;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;

import java.io.IOException;

/**
 * @Author : ASUS and xinrong
 * @Version : 2020/4/25 & 1.0
 */
public class HdfsToHbaseReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {
    @Override
    protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
        //将读出来的每一行写入到Hbase中指定的表中
        for(Put put:values){
            context.write(NullWritable.get(),put);
        }
    }
}

三、HdfsToHbaseRunner类-执行类

package HdfsToHbase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * @Author : ASUS and xinrong
 * @Version : 2020/4/25 & 1.0
 */
public class HdfsToHbaseRunner extends Configured implements Tool {
    @Override
    public int run(String[] strings) throws Exception {
        //1.得到Configuration
        Configuration conf = this.getConf();
        //2.创建Job任务
        Job job = Job.getInstance(conf, this.getClass().getSimpleName());
        //3.配置job-运行的类
        job.setJarByClass(HdfsToHbaseRunner.class);
        //4.输入路径-9000是NameNode的地址
        Path inPath = new Path("hdfs://bigdata111:9000/user/root/plus/staff_timestamp");
        FileInputFormat.addInputPath(job, inPath);
        //5.配置Mapper
        job.setMapperClass(HdfsToHbaseMapper.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(Put.class);

        //6.配置Reducer-输出表名、执行类、job
        TableMapReduceUtil.initTableReducerJob("student", HdfsToHbaseReducer.class, job);
        //7.配置Reducer数量
        job.setNumReduceTasks(1);
        //8.运行状态
        boolean status = job.waitForCompletion(true);
        if(status){
            System.out.println("运行成功!");
            return 0;
        }else{
            System.out.println("运行失败!");
            return 1;
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration configuration = HBaseConfiguration.create();
        int status = ToolRunner.run(configuration, new HdfsToHbaseRunner(), args);
        System.exit(status);
    }
}

四、执行

1.Maven打包:clear+package

2.上传jar包到:/opt/module/hbase-1.3.1/myjob 目录下

[root@bigdata111 myjob]# ll

total 20

-rw-r--r-- 1 root root 18525 Apr 25 19:50 hbase-1.0-SNAPSHOT.jar

3.运行:

[root@bigdata111 myjob]# yarn jar hbase-1.0-SNAPSHOT.jar HdfsToHbase.HdfsToHbaseRunner

尖叫提示:运行任务前,如果待数据导入的表不存在,则需要提前创建之。

五、结果展示

1.操作之前HDFS中的表、Hbase中的表展示:

注意:存入的列簇在要上传的表中一定要存在,否则会报错:

 

 

 2.结果:

 

博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3