Cardinality Aggs

       es中的去重,cartinality metric,对每个bucket中的指定的field进行去重,取去重后的count,类似于count(distcint)

需求:每月出生后,薪水的统计

GET /user/_search
{
  "aggs":{
    "month":{
      "date_histogram":{
        "field":"birthday",
        "interval":"month"
      },
      "aggs":{
        "distinct_salary":{
          "cardinality":{
            "field":"salary"
          }
        }
      }
    }
  },
  "size":0
}


----------------------结果-----------------------
{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "month" : {
      "buckets" : [
        {
          "key_as_string" : "2020-01-01 00:00:00",
          "key" : 1577836800000,
          "doc_count" : 1,
          "distinct_salary" : {
            "value" : 1
          }
        },
        {
          "key_as_string" : "2020-02-01 00:00:00",
          "key" : 1580515200000,
          "doc_count" : 1,
          "distinct_salary" : {
            "value" : 1
          }
        },
        {
          "key_as_string" : "2020-03-01 00:00:00",
          "key" : 1583020800000,
          "doc_count" : 1,
          "distinct_salary" : {
            "value" : 1
          }
        },
        {
          "key_as_string" : "2020-04-01 00:00:00",
          "key" : 1585699200000,
          "doc_count" : 0,
          "distinct_salary" : {
            "value" : 0
          }
        },
        {
          "key_as_string" : "2020-05-01 00:00:00",
          "key" : 1588291200000,
          "doc_count" : 0,
          "distinct_salary" : {
            "value" : 0
          }
        },
        {
          "key_as_string" : "2020-06-01 00:00:00",
          "key" : 1590969600000,
          "doc_count" : 3,
          "distinct_salary" : {
            "value" : 3
          }
        }
      ]
    }
  }
}

       precision_threshold优化准确率和内存开销,统计有多少薪资

GET /user/_search
{
  "size": 0,
  "aggs":{
    "distinct_salary":{
      "cardinality":{
        "field":"salary",
        "precision_threshold":100
      }
    }
  }
}

      cardinality算法,会占用precision_threshold * 8 byte 内存消耗,100 * 8 = 800个字节 占用内存很小。。。而且unique value如果的确在值以内,那么可以确保100%准确

  precision_threshold,值设置的越大,占用内存越大, 假设设置 1000,那么1000 * 8 = 8000 / 1000 = 8KB,可以确保更多unique value的场景下,100%的准确

  field,去重,count,这时候,unique value,10000, precision_threshold=10000,10000 * 8 = 80000个byte,80KB
  

       

posted on 2021-09-12 08:14  溪水静幽  阅读(76)  评论(0)    收藏  举报