spark统计hbase的hfile数量和查看phoenix的状态
1.spark直接读取hfile统计数量比统计hbase更快
1)建立hasgj表的快照表:***Snapshot
语句为:snapshot '***','***Snapshot'
import java.io.IOException; import java.util.*; import java.util.Map.Entry; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableInputFormat; import org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat; import org.apache.hadoop.hbase.protobuf.ProtobufUtil; import org.apache.hadoop.hbase.protobuf.generated.ClientProtos; import org.apache.hadoop.hbase.util.Base64; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.mapreduce.Job; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import scala.Tuple2; public class SparkReadHFile { private static String convertScanToString(Scan scan) throws IOException { ClientProtos.Scan proto = ProtobufUtil.toScan(scan); return Base64.encodeBytes(proto.toByteArray()); } public static void main(String[] args) throws IOException { final String date=args[0]; // final String date="123"; int max_versions = 1; SparkConf sparkConf = new SparkConf().setAppName("sparkReadHfile");//.setMaster("local[*]"); JavaSparkContext sc = new JavaSparkContext(sparkConf); Configuration hconf = HBaseConfiguration.create(); hconf.set("hbase.rootdir", "/hbase"); hconf.set("hbase.zookeeper.quorum", "*:2181,*:2181,*:2181"); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes("C")); scan.setMaxVersions(max_versions); hconf.set(TableInputFormat.SCAN, convertScanToString(scan)); Job job = Job.getInstance(hconf); Path path = new Path("/snapshot"); String snapName ="***Snapshot";//快照名 TableSnapshotInputFormat.setInput(job, snapName, path); JavaPairRDD<ImmutableBytesWritable, Result> newAPIHadoopRDD = sc.newAPIHadoopRDD(job.getConfiguration(), TableSnapshotInputFormat.class, ImmutableBytesWritable.class,Result.class); List<String> collect = newAPIHadoopRDD.map(new Function<Tuple2<ImmutableBytesWritable, Result>, String>(){ private static final long serialVersionUID = 1L; public String call(Tuple2<ImmutableBytesWritable, Result> v1) throws Exception { // TODO Auto-generated method stub String newMac =null; Result result = v1._2(); System.out.println("执行。。。"); if (result.isEmpty()) { return null; } String rowKey = Bytes.toString(result.getRow()); System.out.println("行健为:"+rowKey); NavigableMap<byte[], byte[]> familyMap = result.getFamilyMap(Bytes.toBytes("C")); Set<Entry<byte[], byte[]>> entrySet = familyMap.entrySet(); Iterator<Entry<byte[], byte[]>> it = entrySet.iterator(); String colunNmae =null; String minDate="34561213"; while(it.hasNext()){ colunNmae = new String(it.next().getKey());//列 if(colunNmae.compareTo(minDate)<0){ minDate=colunNmae; } } if (date.equals(minDate)) { // row=rowKey.substring(4); newMac=rowKey; //ls.add(rowKey.substring(4)); //bf.append(rowKey+"----"); } return newMac; } }).collect(); ArrayList<String> arrayList = new ArrayList<String>(); for (int i = 0; i < collect.size(); i++) { if (collect.get(i) !=null) { arrayList.add(collect.get(i)); } } System.out.println("新增mac数"+(arrayList.size())); } }
2。这个代码不能在windows本地运行,要在linux上运行,因为运行时它会去在hdfs找快照表。下面是运行命令:
[root@aaa-12 aaa]# spark2-submit --master yarn --deploy-mode cluster --driver-memory 2g --executor-cores 3 --queue thequeue --executor-memory 6g --name "TestSC" --jars /aaa/*,/aaa/json-lib-2.4-jdk15.jar --class SparkReadHFile --conf spark.dynamicAllocation.enabled=false --conf spark.authenticate=true /aaa/aaa.jar 34561213
参考:https://www.cnblogs.com/kwzblog/p/9007713.html
二:phoenix二级索引表状态的查看(转载:https://www.cnblogs.com/hbase-community/p/8879848.html)
索引总共有以下几个状态,其状态信息存储在SYSTEM.CATALOG表中。可以通过以下SQL来查看所有索引表信息:
select TABLE_NAME,DATA_TABLE_NAME,INDEX_TYPE,INDEX_STATE,INDEX_DISABLE_TIMESTAMP from system.catalog where INDEX_TYPE is not null;
SQL中字段:
- TABLE_NAME表示索引表名
- DATA_TABLE_NAME表示原数据表名
- INDEX_TYPE表示索引类型
GLOBAL(1)
LOCAL(2)
- INDEX_STATE表示索引状态
BUILDING("b")
USABLE("e")
UNUSABLE("d")
ACTIVE("a")
INACTIVE("i")
DISABLE("x"))
REBUILD("r") - DISABLE 表示索引将处于不可用的维护状态,同时将不能用于查询中。
- REBUILD 表示索引将完成重建,同时一旦重建完成此索引将能被在此用于查询中。
- BUILDING 表示将从索引不可用的时间戳处重建索引直到重建完成。
- INACTIVE/UNUSABLE 表示索引将不能用于查询中,但索引仍然在不可用的维护状态。
- ACTIVE/USABLE 表示索引表能被正常用于查询中。
浙公网安备 33010602011771号