spark统计hbase的hfile数量和查看phoenix的状态

1.spark直接读取hfile统计数量比统计hbase更快

   1)建立hasgj表的快照表:***Snapshot

   语句为:snapshot '***','***Snapshot'

import java.io.IOException;
import java.util.*;
import java.util.Map.Entry;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat;
import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
import org.apache.hadoop.hbase.protobuf.generated.ClientProtos;
import org.apache.hadoop.hbase.util.Base64;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import scala.Tuple2;

public class SparkReadHFile {
    private static String convertScanToString(Scan scan) throws IOException {
        ClientProtos.Scan proto = ProtobufUtil.toScan(scan);
        return Base64.encodeBytes(proto.toByteArray());
    }

    public static void main(String[] args) throws IOException {
        final String date=args[0];
      //  final String date="123";
        int max_versions = 1;
        SparkConf sparkConf = new SparkConf().setAppName("sparkReadHfile");//.setMaster("local[*]");
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        Configuration hconf = HBaseConfiguration.create();
        hconf.set("hbase.rootdir", "/hbase");
        hconf.set("hbase.zookeeper.quorum", "*:2181,*:2181,*:2181");
        Scan scan = new Scan();
        scan.addFamily(Bytes.toBytes("C"));
        scan.setMaxVersions(max_versions);
        hconf.set(TableInputFormat.SCAN, convertScanToString(scan));
        Job job = Job.getInstance(hconf);
        Path path = new Path("/snapshot");
        String snapName ="***Snapshot";//快照名
        TableSnapshotInputFormat.setInput(job, snapName, path);
        JavaPairRDD<ImmutableBytesWritable, Result> newAPIHadoopRDD = sc.newAPIHadoopRDD(job.getConfiguration(), TableSnapshotInputFormat.class, ImmutableBytesWritable.class,Result.class);
        List<String> collect = newAPIHadoopRDD.map(new Function<Tuple2<ImmutableBytesWritable, Result>, String>(){
            private static final long serialVersionUID = 1L;
            public String call(Tuple2<ImmutableBytesWritable, Result> v1)
                    throws Exception {
                // TODO Auto-generated method stub
                String newMac =null;
                Result result = v1._2();
                System.out.println("执行。。。");
                if (result.isEmpty()) {
                    return null;
                }
                String rowKey = Bytes.toString(result.getRow());
                System.out.println("行健为:"+rowKey);
                NavigableMap<byte[], byte[]> familyMap = result.getFamilyMap(Bytes.toBytes("C"));
                Set<Entry<byte[], byte[]>> entrySet = familyMap.entrySet();
                Iterator<Entry<byte[], byte[]>> it = entrySet.iterator();
                String colunNmae =null;
                String minDate="34561213";
                while(it.hasNext()){
                    colunNmae = new String(it.next().getKey());//
                    if(colunNmae.compareTo(minDate)<0){
                        minDate=colunNmae;
                    }
                }

                if (date.equals(minDate)) {
//                    row=rowKey.substring(4);
                    newMac=rowKey;
                    //ls.add(rowKey.substring(4));
                    //bf.append(rowKey+"----");
                }
                return  newMac;
            }
        }).collect();
        ArrayList<String> arrayList = new ArrayList<String>();
        for (int i = 0; i < collect.size(); i++) {
            if (collect.get(i) !=null) {
                arrayList.add(collect.get(i));
            }
        }
        System.out.println("新增mac数"+(arrayList.size()));

    }
}

2。这个代码不能在windows本地运行,要在linux上运行,因为运行时它会去在hdfs找快照表。下面是运行命令:

[root@aaa-12 aaa]# spark2-submit  --master yarn  --deploy-mode cluster --driver-memory 2g  --executor-cores 3 --queue thequeue  --executor-memory 6g  --name "TestSC"  --jars /aaa/*,/aaa/json-lib-2.4-jdk15.jar --class SparkReadHFile  --conf spark.dynamicAllocation.enabled=false --conf spark.authenticate=true /aaa/aaa.jar 34561213

参考:https://www.cnblogs.com/kwzblog/p/9007713.html

 二:phoenix二级索引表状态的查看(转载:https://www.cnblogs.com/hbase-community/p/8879848.html)

索引总共有以下几个状态,其状态信息存储在SYSTEM.CATALOG表中。可以通过以下SQL来查看所有索引表信息:

select TABLE_NAME,DATA_TABLE_NAME,INDEX_TYPE,INDEX_STATE,INDEX_DISABLE_TIMESTAMP
from system.catalog where INDEX_TYPE is not null;

 

 

 

SQL中字段:

  • TABLE_NAME表示索引表名
  • DATA_TABLE_NAME表示原数据表名
  • INDEX_TYPE表示索引类型
    GLOBAL(1)

LOCAL(2)

    • INDEX_STATE表示索引状态

      BUILDING("b")
      USABLE("e")
      UNUSABLE("d")
      ACTIVE("a")
      INACTIVE("i")
      DISABLE("x"))
      REBUILD("r")

    • DISABLE 表示索引将处于不可用的维护状态,同时将不能用于查询中。
    • REBUILD 表示索引将完成重建,同时一旦重建完成此索引将能被在此用于查询中。
    • BUILDING 表示将从索引不可用的时间戳处重建索引直到重建完成。
    • INACTIVE/UNUSABLE 表示索引将不能用于查询中,但索引仍然在不可用的维护状态。
    • ACTIVE/USABLE 表示索引表能被正常用于查询中。 
posted @ 2019-02-18 10:03  聚云  阅读(788)  评论(0)    收藏  举报