Elasticsearch 聚合Aggregations API
简介:聚合框架有助于根据搜索查询提供聚合数据,语法定义如下:
"aggregations" : { // 可以简写为aggs
"<aggregation_name>" : { // 聚合名字,唯一标识符
"<aggregation_type>" : { // 聚合类型
<aggregation_body> // 聚合体,对那些字段聚合
}
[,"meta" : { [<meta_data_body>] } ]? // 元
[,"aggregations" : { [<sub_aggregation>]+ } ]? // 聚合里面的子聚合
}
[,"<aggregation_name_2>" : { ... } ]* // 另一个聚合名字
}
注意:设置size=0,表示只返回聚合结果,不需要查询原始数据
一、Metric Aggregations(指标聚合):对桶内的文档进行统计计算
1. Top Hits:获取文档前几条数据,相当于MySQL中limit
A. URL:POST /index/_search?size=0
B. 请求参数
form:开始位置;
size:返回匹配项的最大数量,默认值3;
sort:匹配项的排序方式,默认是按照分数排序。
C. Kibana查询

D. Java实现
TopHitsAggregationBuilder aggregationBuilder = AggregationBuilders.topHits("top_hits").sort("time", SortOrder.DESC).size(1);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
TopHits topHits = aggregations.get("top_hits");
}
2. Cardinality:统计去重后的文档数,相当于MySQL中count(distinct(字段))
A. URL:POST /index/_search?size=0
B. 请求参数
field:去重字段名;
script:脚本。
C. Kibana查询
D. Java实现
CardinalityAggregationBuilder aggregationBuilder = AggregationBuilders.cardinality("cardinality").field("cid");
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
Cardinality cardinality = aggregations.get("cardinality");
long count = cardinality.getValue();
}
3. Max:对指定字段求最大值
A. URL:POST /index/_search?size=0
B. 请求参数
field:求最大值字段名;
script:脚本。
C. Kibana查询
D. Java实现
MaxAggregationBuilder aggregationBuilder = AggregationBuilders.max("max").field("timestamp");
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
ParsedMax max = aggregations.get("max");
String timestamp = max.getValueAsString();
}
4. Min:对指定字段求最小值
A. URL:POST /index/_search?size=0
B. 请求参数
filed:求最小值字段名;
script:脚本。
C. Kibana查询
D. Java实现
MinAggregationBuilder aggregationBuilder = AggregationBuilders.min("min").field("timestamp");
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
ParsedMin min = aggregations.get("min");
String timestamp = min.getValueAsString();
}
5. Sum:对指定字段值求和
A. URL:POST /index/_search?size=0
B. 请求参数
filed:求和字段名;
script:脚本。
C. Kibana查询
D. Java实现
SumAggregationBuilder aggregationBuilder = AggregationBuilders.sum("sum").field("low");
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
Sum sum = aggregations.get("low");
Double low = sum.getValue();
}
6. Avg:求均值
A. URL:
B. 请求参数
script:脚本
C. Kibana查询
D. Java实现
7. Stats:统计,包含Max、Min、Sum、Avg
A. URL:
B. 请求参数
script:脚本
C. Kibana查询
D. Java实现
8. Value Count:统计文档数,重复的依然会计数
A. URL:POST /index/_search?size=0
B. 请求参数
field:统计的字段名;
script:脚本。
C. Kibana查询
D. Java实现
ValueCountAggregationBuilder aggregationBuilder = AggregationBuilders.count("count").field("cid");
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
ValueCount valueCount = aggregations.get("count");
long count = valueCount.getValue();
}
二、Bucket Aggregations(桶聚合):满足特定条件的文档的集合
1. Terms:对指定字段进行分组统计,相当于MySQL中group by或select distict column from table,该聚合不太准确
A. URL:GET /index/_search
B. 请求参数
filed:分组对象名,只适合一个字段;
size:返回文档的个数,默认值10,size值越大,数据越准确,伴随成本也越高;
order:指定返回结果的排序方式;
script:脚本,仅限于根据两个字段进行分组,但这有性能问题,最好不用。
C. Kibana查询
D. Java实现
// Script script = new Script("doc['data.srcip'].value + '_' + doc['data.dstip'].value");
// TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").script(script).size(Integer.MAX_VALUE);
TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").field("data.ip").size(Integer.MAX_VALUE);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
Terms terms = aggregations.get("terms");
}
2. Filter:对查询的文档再进行过滤
A. URL:POST /index/_search?size=0
B. 请求参数:可参考DSL语句查询
C. Kibana查询
D. Java实现
FilterAggregationBuilder aggregationBuilder = AggregationBuilders.filter("filter", QueryBuilders.termsQuery("rule", new String[]{"login", "auth", "cca"}));
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是解决索引不存在的问题
if (aggregations != null) {
Filter filter = aggregations.get("filter");
}
3. Range:按指定区间范围统计,注意包括from值,不包括to值
A. URL:GET /index/_search
B. 请求参数
field:区间字段名;
to value1:指从*到value1范围,不包括value1;
from value1 - to value2:指从value1 到value2范围,包括value1,但不包括value2;
from value2:指从value2到*范围,包括value2。
C. Kibana查询

D. Java实现
RangeAggregationBuilder aggregationBuilder = AggregationBuilders.range("range").field("level").addUnboundedTo("1", 6).addRange("2", 6, 11).addUnboundedFrom("3", 11);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是解决索引不存在的问题
if (aggregations != null) {
Range range = aggregations.get("range");
}
4. Date histogram:按日期统计日期直方图数据,适用于日期和日期范围聚合
A. URL:POST /index/_search?size=0
B. 请求参数
field:日期字段名;
format:时间格式;
calendar_interval:日历间隔,比如2d;
fixed_interval:固定间隔,比如1000ms;
min_doc_count:最小文档数,比该值还小就忽略获取。
C. Kibana查询
D. Java实现
DateHistogramAggregationBuilder aggregationBuilder = AggregationBuilders.dateHistogram("date_histogram")
.field("timestamp")
.format("yyyy-MM-dd")
.calendarInterval(new DateHistogramInterval("1d"))
.minDocCount(1);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
if (aggregations != null) {
ParsedDateHistogram histogram = aggregations.get("date_histogram");
}
5. Date range:按日期值的区间范围统计
A. URL:POST /index/_search?size=0
B. 请求参数
field:日期区间字段名;
format:时间格式;
to value1:指从*到value1范围,不包括value1;
from value1 - to value2:指从value1 到value2范围,包括value1,但不包括value2;
C. Kibana查询
D. Java实现
DateRangeAggregationBuilder dateRangeAggregationBuilder = AggregationBuilders.dateRange("day_range")
.field("day")
.format("yyyy-MM-dd")
.addRange("1", "2020-02-03")
.addRange("2", "2020-02-03", "2020-03-10")
.addRange("3", "2020-03-10");
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
ParsedDateRange dateRange = aggregations.get("day_range");
}
三、Pipeline Aggregations(管道聚合):是基于其他聚合而非文档集所产生的输出,类似数据库分组后分页
1. Bucket Sort:是对其父多桶聚合的桶进行排序
A. URL:POST /sales/_search?size=0
B. 请求参数
from:设置值之前的位置的存储桶将被截断,默认值为0,注意分页需是size的整数倍;
size:要返回的存储桶数,默认为父聚合的所有存储桶;
sort:定义排序结构,可以多字段
C. Kibana查询:
D. Java实现:

浙公网安备 33010602011771号