day3: elasticsearch的聚合查询
感谢博主的贡献: https://juejin.im/post/6844904032398475278#heading-1
聚合基础:
https://juejin.im/post/6844904032398475278#heading-1
聚合深入理解:
Elasticsearch:aggregation介绍
Elasticsearch:pipeline aggregation 介绍
Elasticsearch:透彻理解Elasticsearch中的Bucket aggregation
查找不同的年龄段:
GET twitter/_search
{
"size": 0,
"age": {
"range": {
"field": "age",
"ranges": [{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
}
}
}
使用range类型的聚合
在上面我们定义了不同的年龄段。通过上面的查询,我们可以得到不同年龄段的bucket。显示的结果如下,符合条件的文档在 hits.hits列表中以一个个的字典存在:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"age": {
"buckets": [{
"key": "20.0-30.0",
"from": 20.0,
"to": 30.0,
"doc_count": 0
},
{
"key": "30.0-40.0",
"from": 30.0,
"to": 40.0,
"doc_count": 3
},
{
"key": "40.0-50.0",
"from": 40.0,
"to": 50.0,
"doc_count": 0
}
]
}
}
}
统计关键字出现的频率:
内置关键字 aggs,terms, field, keyword
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"number_of_cities":{"terms":{"field":"city.keyword"}}}, "size":0}'
{
"aggs": {
"number_of_cities": {
"terms": {
"field": "city.keyword"
}
}
},
"size": 0
}
得到
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 71150,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"number_of_cities": {
"doc_count_error_upper_bound": 116,
"sum_other_doc_count": 16983,
"buckets": [{
"key": "合肥",
"doc_count": 30017
},
{
"key": "",
"doc_count": 16761
},
{
"key": "columbia",
"doc_count": 1546
}
]
}
}
}
统计城市出现的个数:
到底有多少个城市,内置关键字 cardinality
XGET _search { "size": 0, "aggs": { "number_of_cities": { "cardinality": { "field": "city.keyword" } } } }
{
"size": 0,
"aggs": {
"number_of_cities": {
"cardinality": {
"field": "city.keyword"
}
}
}
}
统计用户平均年龄:
内置函数 avg
GET twitter/_search { "size": 0, "aggs": { "average_age": { "avg": { "field": "age" } } } }
统计平均分 avg,最大分 max,最小分 min,总和 sum
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"average_score":{"avg":{"field":"os_score"}}}, "size":0}'
{
"aggs": {
"average_score": {
"avg": {
"field": "os_score"
}
}
},
"size": 0
}
通过script的方法来对我们的aggregtion结果进行重新计算:
最大分的基础上乘以 0.8 用 *, 除以 2 用 / , 加上一个数 用 +, 减去一个数用 - ,
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_search?pretty' -d '{"aggs":{"average_score":{"max":{"field":"os_score", "script":{"source":"_value * params.correction", "params":{"correction": 0.8}}}}}, "size":0}'
{
"size": 0,
"aggs": {
"average_score": {
"max": {
"field": "os_score",
"script": {
"source": "_value * params.correction",
"params": {
"correction": 0.8
}
}
}
}
}
}
不用 field, 直接使用 script 聚合:
与上述效果等价,尝试未成功
GET twitter/_search
{
"size": 0,
"aggs": {
"average_2_times_os_score": {
"avg": {
"script": {
"source": "doc['os_score'].value * params.times",
"params": {
"times": 2.0
}
}
}
}
}
}
Percentile aggregation
百分位数聚合,如下语句可查出 os_score 的离群值,得到了 25, 50, 75, 100 的分数占比
{
"size": 0,
"aggs": {
"os_score_quartiles": {
"percentiles": {
"field": "os_score",
"percents": [
25,
50,
75,
100
]
}
}
}
}
查找结果如下,可以看到
25% 的分数为 90 分以下
50% 的分数在 92 分以下
75% 的分数在 100 分以下
最高分为 100 分
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 71150,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"os_score_qualities": {
"values": {
"25.0": 90.0,
"50.0": 92.0,
"75.0": 100.0,
"100.0": 100.0
}
}
}
}
analyzer
实现秒级的搜索速度的原因之一:文档被存储时加了索引
curl -H 'Content-type: application/json' -XGET 'http://localhost:10290/apollo/_analyze?pretty' -d '{"text":["我是一个兵"], "analyzer":"standard"}'
{
"text": ["我是一个兵"],
"analyzer": "standard"
}
结果如下,五个token
{
"tokens": [{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "一",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "个",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position": 3
},
{
"token": "兵",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position": 4
}
]
}

浙公网安备 33010602011771号