Cardinality Aggs
es中的去重,cartinality metric,对每个bucket中的指定的field进行去重,取去重后的count,类似于count(distcint)
需求:每月出生后,薪水的统计
GET /user/_search { "aggs":{ "month":{ "date_histogram":{ "field":"birthday", "interval":"month" }, "aggs":{ "distinct_salary":{ "cardinality":{ "field":"salary" } } } } }, "size":0 } ----------------------结果----------------------- { "took" : 29, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 6, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "2020-01-01 00:00:00", "key" : 1577836800000, "doc_count" : 1, "distinct_salary" : { "value" : 1 } }, { "key_as_string" : "2020-02-01 00:00:00", "key" : 1580515200000, "doc_count" : 1, "distinct_salary" : { "value" : 1 } }, { "key_as_string" : "2020-03-01 00:00:00", "key" : 1583020800000, "doc_count" : 1, "distinct_salary" : { "value" : 1 } }, { "key_as_string" : "2020-04-01 00:00:00", "key" : 1585699200000, "doc_count" : 0, "distinct_salary" : { "value" : 0 } }, { "key_as_string" : "2020-05-01 00:00:00", "key" : 1588291200000, "doc_count" : 0, "distinct_salary" : { "value" : 0 } }, { "key_as_string" : "2020-06-01 00:00:00", "key" : 1590969600000, "doc_count" : 3, "distinct_salary" : { "value" : 3 } } ] } } }
precision_threshold优化准确率和内存开销,统计有多少薪资
GET /user/_search { "size": 0, "aggs":{ "distinct_salary":{ "cardinality":{ "field":"salary", "precision_threshold":100 } } } }
cardinality算法,会占用precision_threshold * 8 byte 内存消耗,100 * 8 = 800个字节 占用内存很小。。。而且unique value如果的确在值以内,那么可以确保100%准确
precision_threshold,值设置的越大,占用内存越大, 假设设置 1000,那么1000 * 8 = 8000 / 1000 = 8KB,可以确保更多unique value的场景下,100%的准确
field,去重,count,这时候,unique value,10000, precision_threshold=10000,10000 * 8 = 80000个byte,80KB
立志如山 静心求实
浙公网安备 33010602011771号