Hbase&Hadoop常用命令

取消hdfs权限限制：

CM-->HDFS-->configuration, 搜索：dfs.permissions，取消勾选，然后deploye client configuration，然后重启hdfs。

Hbase中根据Rowkey的前缀Prefix查询数据：

scan 'test_xiaomifeng_monitoring_log',{FILTER => "(PrefixFilter ('166_20130816080'))"}

查看表结构：

describe 'table1'

创建表：create 'table1','d'

第一个是表名，第二个是family

建表时可以设置TTL过期时间：create 'tableName',{NAME=>'cf',TTL=>超时时间秒为单位}

后期修改

首先停表：disable 'tableName'

然后修改TTL失效时间：alter 'tableName',NAME => 'cf', TTL => 超时时间秒为单位

恢复表：enable 'tableName'

查看表元数据：describe 'tableName'

插入数据：

put 'tableName', 'RowKey','cf1:qualifier','value'

删除数据：

put'tableName', 'RowKey','cf1:qualifier','value'
delete 'tableName', 'RowKey', 'ColumnFamily:qualifier',
deleteall 'tableName', 'RowKey'

在本地和HDFS之间批量拷贝数据 ：

copyFromLocal

Usage: hdfs dfs -copyFromLocal <localsrc> URI

Similar to put command, except that the source is restricted to a local file reference.

copyToLocal

Usage: hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.

修改文件副本保存的个数：

配置hadoop的conf目录中的hdfs-site.xml：

<property>
<name>dfs.replication</name>
<value>2</value>
</property>

对于已经上传的文件，修改其副本个数：

hadoop fs -setrep [-R] [-w] <rep> <path/file>: Set the replication level of a file.

The -R flag requests a recursive change of replication level

for an entire tree.

例如：hadoop fs -setrep -R -w 2 /

上面的这句话会将HDFS根目录下所有的文件的保存的份数该为2.

手动kill集群上的job：

先用hadoop job -list查看任务列表，找到jobid，更直观的做法是到job tracker上查找jobid。再用hadoop job -kill jobId。

HBase只返回前10行数据：

scan 'tb1', {LIMIT => 10}

查看某个目录所有文件各自的大小：

hadoop fs -du URI [URI …]

查看整个文件夹的大小：

hadoop fs -dus URI [URI …]

在普通机器上执行 Hadoop2的MapReduce Job：

安装完dpl-tools后，java -classpath /opt/hugedata/dpl/lib/*:aaa.jar com.hugedata.dataanalysis.test.TestMR

AccessControlException Permission denied: user=root, access=WRITE,异常的解决办法

异常信息：

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=root, access=WRITE, inode="/user/hive/warehouse":hive:supergroup:drwxr-xr-x

原因：以root用户启动的hive shell。因为hadoop2中hdfs增加了权限控制，默认情况下，只有hive才有目录/user/hive/warehouse的写权限，可以通过修改该目录的权限来达到以root用户启动hive shell也可以操作hive数据库的目的。

解决办法：

sudo -u hdfs hadoop fs -chmod -R 777 /user/hive/warehouse/

然后再启动hive，执行create命令等需要写目录的命令就不会报错了。

查询HBase各个表的大小（最终占用空间需要*备份数）：

hadoop fs -du -h /hbase/data/default

非linux的root用户，切换到hdfs用户的方法：

sudo su - hdfs

非linux的root用户，切换到hbase用户的方法：

sudo su -lm hbase

列出坏块：

hdfs fsck -list-corruptfileblocks

CDH6.2 wordcount示例

cd /usr/lib/hadoop-mapreduce/

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /test/wd.txt /test/output

1、在讲测试用例之前，我们首先为系统当前用户在HDFS中创建一下工作目录，并服务相应的权限。

1.1、由于我安装的时候是用的root用户，因此也就需要在hdfs中为root用户创建工作目录，并授予权限。

　　（1）首先在HDFS中，在用户目录/user/下创建一个root用户文件夹，作为root用户的工作目录。执行如下代码：

　　　　sudo -u hdfs hadoop fs -mkdir /user/root

　　（2）授予/user/root目录相应的权限

　　　　1）先将该目录的所有权赋给root用户： sudo -u hdfs hadoop fs -chown root /user/root

　　　　2）再将该目录的组的权限赋给root用户自己管理：sudo -u hdfs hadoop fs -chgrp root /user/root

　　　　3）最后设置该目录的权限：sudo -u hdfs hadoop fs -chmod 777 /user/root （该权限是拥有者：可读可写可执行；组用户：可读可写可执行；其他用户：可读可写可执行）

1.2、给普通用户创建HDFS工作目录，并授予权限。普通用户与root方法类似，只不过这个过程是需要在root用户下执行的。

2、测试WordCount例子。

package com.songguoliang.hbase;
 
import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 
/**
 * HBase与WordCount的结合使用Demo
 * @date 2015-07-27 11:21:48
 * @author sgl
 */
public class WordCountHBase {
    /**
     * Map
     * @date 2015-07-27 11:24:04
     * @author sgl
     */
    public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{
        private IntWritable one=new IntWritable(1);
        /*
         * 重写map方法
         * (non-Javadoc)
         * @see org.apache.hadoop.mapreduce.Mapper#map(KEYIN, VALUEIN, org.apache.hadoop.mapreduce.Mapper.Context)
         * @date 2015-07-27 11:29:48
         * @author sgl
         */
        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            //将输入的每行内容以空格分开
            String words[]=value.toString().trim().split(" ");
            for(String word:words){
                context.write(new Text(word), one);
            }
        }
    }
    /**
     * Reduce
     * @date 2015-07-27 11:36:03
     * @author sgl
     * @version $Id: WordCountHBase.java, v 0.1 2015-07-27 11:36:03 sgl Exp $
     */
    public static class Reduce extends TableReducer<Text, IntWritable, NullWritable>{
        /*
         * 重写reduce方法
         * (non-Javadoc)
         * @see org.apache.hadoop.mapreduce.Reducer#reduce(KEYIN, java.lang.Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
         * @date 2015-07-27 11:36:12
         * @author sgl
         */
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, NullWritable, Writable>.Context context) throws IOException, InterruptedException {
            int sum=0;
            for(IntWritable value:values){
                sum+=value.get();
            }
            //Put实例化，每一个单词存一行
            Put put=new Put(Bytes.toBytes(key.toString()));
            //列族为content,列修饰符为count,列值为数量
            put.add(Bytes.toBytes("content"), Bytes.toBytes("count"), Bytes.toBytes(String.valueOf(sum)));
            context.write(NullWritable.get(), put);
        }
        
    }
    /**
     * 在HBase中创建表
     * @date 2015-07-27 11:50:42
     * @author sgl
     * @param tableName 表名
     * @throws IOException
     */
    public static void createHBaseTable(String tableName) throws IOException{
        HTableDescriptor tableDescriptor=new HTableDescriptor(tableName);
        HColumnDescriptor columnDescriptor=new HColumnDescriptor("content");
        tableDescriptor.addFamily(columnDescriptor);
        Configuration conf=HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "sdw1,sdw2");
        HBaseAdmin admin=new HBaseAdmin(conf);
        if(admin.tableExists(tableName)){
            System.out.println("表已存在，正在尝试重新创建表！");
            admin.disableTable(tableName);
            admin.deleteTable(tableName);
        }
        System.out.println("创建新表："+tableName);
        admin.createTable(tableDescriptor);
    }
    
    public static void main(String[] args) {
        try {
            String tableName="wordcount";
            createHBaseTable(tableName);
            
            Configuration conf=new Configuration();
            conf.set(TableOutputFormat.OUTPUT_TABLE, tableName);
            conf.set("hbase.zookeeper.quorum", "sdw1,sdw2");
            String input=args[0];
            Job job=new Job(conf, "WordCount table with "+input);
            job.setJarByClass(WordCountHBase.class);
            job.setMapperClass(Map.class);
            job.setReducerClass(Reduce.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(TableOutputFormat.class);
            FileInputFormat.addInputPath(job, new Path(input));
            System.exit(job.waitForCompletion(true)?0:1);
        } catch (Exception e) {
            e.printStackTrace();
        }
        
    }
    
}

Hue 配置支持HBase时出错：

https://www.cnblogs.com/justinyang/p/8728630.html

https://blog.csdn.net/lhmood/article/details/106584051

如果浏览hbase时出现：Hue - Hbase Api Error: TSocket read 0 bytes

- CM --> Hbase --> Configuration --> Search for hbase.regionserver.thrift.compact 
- If it is checked, please UNCHECK it and restart Hbase service

另外，再需要修改的一个地方是：hbase.regionserver.thrift.framed

接下来再修改：

- CM --> HDFS --> Configuration --> Search for core-site.xml

添加：

hadoop.proxyuser.hbase.hosts value是*

再添加：

hadoop.proxyuser.hbase.groups value是*

保存后重启hdfs hbase以及hue即可。

CDH6安装Hue时，测试数据库连接，出错，后台日志（/var/log/cloudera-scm-server/cloudera-scm-server.log）报错：ImportError: No module named useradmin 的解决办法：

/usr/lib/hue目录下有个 app.reg文件软链接到 /var/lib/hue/app.reg

而/var/lib下根本就没有hue这个目录，更别提文件了。

从相同版本的集群上下载/var/lib/hue文件夹上传到安装hue的节点。然后修改该文件夹及所有文件的owner为hue用户：chown -R hue.hue /var/lib/hue

再测试数据库连接，就正常了。

快速count hbase 大表

hbase   org.apache.hadoop.hbase.mapreduce.RowCounter "crawler:all_content_tb"

maven 项目引入CDH的jar 包：

pom中配置：

<repositories>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>

posted on 2013-08-17 16:32 sixiiweb 阅读(952) 评论(0) 收藏举报

刷新页面返回顶部

sixi's blog.