hbase本身提供了 聚合方法可以服务端聚合操作
hbase中的CoprocessorProtocol机制. 
CoprocessorProtocol的原理比较简单,近似于一个mapreduce框架。由client将scan分解为面向多个region的请求,并行发送请求到多个region,然后client做一个reduce的操作,得到最后的结果。 
先看一个例子,使用hbase的AggregationClient可以做到简单的面向单个column的统计。 
- @Test
 - public void testAggregationClient() throws Throwable {
 - LongColumnInterpreter columnInterpreter = new LongColumnInterpreter();
 - AggregationClient aggregationClient = new AggregationClient(
 - CommonConfig.getConfiguration());
 - Scan scan = new Scan();
 - scan.addColumn(ColumnFamilyName, QName1);
 - Long max = aggregationClient.max(TableNameBytes, columnInterpreter,
 - scan);
 - Assert.assertTrue(max.longValue() == 100);
 - Long min = aggregationClient.min(TableNameBytes, columnInterpreter,
 - scan);
 - Assert.assertTrue(min.longValue() == 20);
 - Long sum = aggregationClient.sum(TableNameBytes, columnInterpreter,
 - scan);
 - Assert.assertTrue(sum.longValue() == 120);
 - Long count = aggregationClient.rowCount(TableNameBytes,
 - columnInterpreter, scan);
 - Assert.assertTrue(count.longValue() == 4);
 - }
 
看下hbase的源码。AggregateImplementation 
- @Override
 - public <T, S> T getMax(ColumnInterpreter<T, S> ci, Scan scan)
 - throws IOException {
 - T temp;
 - T max = null;
 - InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment())
 - .getRegion().getScanner(scan);
 - List<KeyValue> results = new ArrayList<KeyValue>();
 - byte[] colFamily = scan.getFamilies()[0];
 - byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();
 - // qualifier can be null.
 - try {
 - boolean hasMoreRows = false;
 - do {
 - hasMoreRows = scanner.next(results);
 - for (KeyValue kv : results) {
 - temp = ci.getValue(colFamily, qualifier, kv);
 - max = (max == null || (temp != null && ci.compare(temp, max) > 0)) ? temp : max;
 - }
 - results.clear();
 - } while (hasMoreRows);
 - } finally {
 - scanner.close();
 - }
 - log.info("Maximum from this region is "
 - + ((RegionCoprocessorEnvironment) getEnvironment()).getRegion()
 - .getRegionNameAsString() + ": " + max);
 - return max;
 - }
 
这里由于 
- byte[] colFamily = scan.getFamilies()[0];
 - byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();
 
所以,hbase自带的Aggregate函数,只能面向单列进行统计。 
当我们想对多列进行Aggregate,并同时进行countRow时,有以下选择。 
1 scan出所有的row,程序自己进行Aggregate和count。 
2 使用AggregationClient,调用多次,得到所有的结果。由于多次调用,有一致性问题。 
3 自己扩展CoprocessorProtocol。 
这个是github的hbase集成插件
这个功能集成到simplehbase里面了。
https://github.com/zhang-xzhi/simplehbase
                    
                
                
            
        
浙公网安备 33010602011771号