ES 常用聚合函数

关于常用聚合函数,ES提供了很多,具体查看官方文档,本文在ES 聚合查询的基础上,相关测试数据也在ES 聚合查询中.

1、range聚合

1.1 统计各个价格范围的食品销售情况,代码如下:

GET food/_search?size=0
{
  "aggs": {
    "price_range": {
      "range": {
        "field": "Price",
        "ranges": [
          {
            "from": 0,
            "to": 100
          },
          {
            "from": 100,
            "to": 200
          },  
          {
            "from": 200,
            "to": 300
          }, 
          {
            "from": 300,
            "to": 400
          }
        ]
      }
    }
  }
}

 

1.2 统计每个月的食品销售情况

GET food/_search?size=0
{
  "aggs": {
    "price_range": {
      "range": {
        "field": "CreateTime",
        "ranges": [
          {
            "from": "2022-05-01 00:00:00",
            "to": "2022-06-01 00:00:00"
          },
          {
            "from": "2022-06-01 00:00:00",
            "to": "2022-07-01 00:00:00"
          },  
          {
            "from": "2022-07-01 00:00:00",
            "to": "2022-08-01 00:00:00"
          }, 
          {
            "from": "2022-08-01 00:00:00",
            "to": "2022-09-01 00:00:00"
          }
        ]
      }
    }
  }
}

 

2、Histogram  柱状图统计 官方文档

2.1 统计各个价位区间的食品销售数量 间隔是100效果和1.1类似

GET food/_search?size=0
{
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "Price",
        "interval": 100
      }
    }
  }
}

 

2.2 统计各个价位区间的食品销售数量 间隔是100 要求过滤掉所有区间能销售量为0的桶结果

GET food/_search?size=0
{
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "Price",
        "interval": 100,
        "min_doc_count": 1
      }
    }
  }
}

 

2.3 统计各个价位区间的食品销售数量 间隔是100 如果区间内存在空值,统一用250替代

这里需要新增一条价格为空的数据方便演示,代码如下:

PUT food/_doc/8
{
  "CreateTime":"2022-04-10 13:11:11",
  "Desc":"猕猴桃 对身体很有好处",
  "Level":"高级水果",
  "Name":"猕猴桃",
  "Price":"",
  "Tags":["性价比","水果","保健"],
  "Type":"水果"
}

查询代码如下:

GET food/_search?size=0
{
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "Price",
        "interval": 100,
        "missing": 250
      }
    }
  }
}

搜索结果如下:

{
  "took" : 106,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "price_histogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 4
        },
        {
          "key" : 100.0,
          "doc_count" : 2
        },
        {
          "key" : 200.0,
          "doc_count" : 1
        },
        {
          "key" : 300.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

注意原先结果里面200-300区间是没有数据的,这个时候插入了一条价格为空的数据,且制定了miss条件为250,es会将所有的价格为空的值用250替换,所以结果中200-300范围的count为1.

 

2.4 key关键字

这里key关键字的用法只是改变了桶聚合值得展示形式通过key value形式展示,这里不在赘述.

 

3、Date-Histogram  官方文档

 

3.1 按照日期进行聚合,统计每个月所有食品得销量

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一个月进行统计
        "format": "yyyy-MM", //日期展示按照  年月展示,
        "min_doc_count": 1 //过滤掉count为0的数据
      }
    }
  }
}

这里用calendar_interval做时间间隔,但是需要注意其支持的单位如下:minute=>1m,hour=>1h,day=>1d,week=>1w,month=>1M,quarter=>1q,year=>1y 最小支持到分钟,最大支持到年.

 

3.2 按照时间进行聚合,统计没毫秒所有食品得销量

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "fixed_interval": "1ms", //每隔1毫秒进行统计
        "min_doc_count": 1 //过滤掉count为0的数据
      }
    }
  }
}

这里用fixed_interval做时间间隔,但是需要注意其支持的单位如下:ms,s,m,h,d 最小支持到毫秒,最大支持到天.

这里有个严重的问题,使用毫秒进行分桶时,会造成es检索出大量数据,造成es卡死,写入收到严重影响,所以要慎用,使用前必须用query或者filter等等进行时间限制

 

3.3 统计今年一年内的每个月食品的销售情况

这里注意,上面的按照月份统计,如果1月份没有数据,es进行分桶时并不会展示1月份的数据,那么显然不符合需求,所以需要让1月份以0显示出来,代码如下

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一个月进行统计
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01 00:00:00",
          "max": "2022-12-31 00:00:00"
        }
      }
    }
  }
}

搜索结果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "adate_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01 00:00:00",
          "key" : 1640995200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-02-01 00:00:00",
          "key" : 1643673600000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-03-01 00:00:00",
          "key" : 1646092800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-04-01 00:00:00",
          "key" : 1648771200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2022-05-01 00:00:00",
          "key" : 1651363200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-06-01 00:00:00",
          "key" : 1654041600000,
          "doc_count" : 3
        },
        {
          "key_as_string" : "2022-07-01 00:00:00",
          "key" : 1656633600000,
          "doc_count" : 4
        },
        {
          "key_as_string" : "2022-08-01 00:00:00",
          "key" : 1659312000000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-09-01 00:00:00",
          "key" : 1661990400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-10-01 00:00:00",
          "key" : 1664582400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-11-01 00:00:00",
          "key" : 1667260800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-12-01 00:00:00",
          "key" : 1669852800000,
          "doc_count" : 0
        }
      ]
    }
  }
}

这里结果就是按照1月份到12月份,按照每个月份进行数量的统计.

注意:这里extended_bounds和min_doc_count的参数的混合使用,当使用extended_bounds进行间隔空白填充时,min_doc_count必须为0,上面说了min_doc_count是为了过滤count为0的风筒,如果min_doc_count为1就会过滤掉extended_bounds产生的空白填充,这就自相矛盾了.

 

3.4 统计今年一年内的每个月食品的销售情况,并按每个月的销售数量进行排序

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一个月进行统计
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01 00:00:00",
          "max": "2022-12-31 00:00:00"
        },
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

 

3.5  统计今年每个月的食物的销售数量,并且计算每个月相对于上一个月的累计值(商品价格的叠加)

GET food/_search?size=0
{
  "aggs": {
    "date_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一个月进行统计
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01 00:00:00",
          "max": "2022-12-31 00:00:00"
        }
      },
      "aggs": {
        "sum_agg": {
          "sum": {
            "field": "Price"
          }
        },
        "cumulative_sum_agg":{
          "cumulative_sum": {
            "buckets_path": "sum_agg"
          }
        }
      }
    }
  }
}

搜索结果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01 00:00:00",
          "key" : 1640995200000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-02-01 00:00:00",
          "key" : 1643673600000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-03-01 00:00:00",
          "key" : 1646092800000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-04-01 00:00:00",
          "key" : 1648771200000,
          "doc_count" : 1,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-05-01 00:00:00",
          "key" : 1651363200000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-06-01 00:00:00",
          "key" : 1654041600000,
          "doc_count" : 3,
          "sum_agg" : {
            "value" : 89.32999992370605
          },
          "cumulative_sum_agg" : {
            "value" : 89.32999992370605
          }
        },
        {
          "key_as_string" : "2022-07-01 00:00:00",
          "key" : 1656633600000,
          "doc_count" : 4,
          "sum_agg" : {
            "value" : 511.43998622894287
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-08-01 00:00:00",
          "key" : 1659312000000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-09-01 00:00:00",
          "key" : 1661990400000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-10-01 00:00:00",
          "key" : 1664582400000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-11-01 00:00:00",
          "key" : 1667260800000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-12-01 00:00:00",
          "key" : 1669852800000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        }
      ]
    }
  }
}

从结果就可以看出,在计算出每个月销量的同时,计算出了每个月的销售额,并且通过cumulative_sum计算除了当前月份和前面所有月份的累计销售额.

 

4、Auto-interval date histogram 官方文档

自动直方图,自动直方图会按照指定的桶数量去计算interval,在某些场景下使用还是用方便的,比如统计今年每个月的食物的销售情况,就可以指定桶数量为180,代码如下:

GET food/_search?size=0
{
  "aggs": {
    "auto_date_histogram_aggs": {
      "auto_date_histogram": {
        "field": "CreateTime",
        "buckets":"12",
        "format":"yyyy-MM-dd "
      }
    }
  }
}

结果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "auto_date_histogram_aggs" : {
      "buckets" : [
        {
          "key_as_string" : "2022-04-01",
          "key" : 1648771200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2022-05-01",
          "key" : 1651363200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-06-01",
          "key" : 1654041600000,
          "doc_count" : 3
        },
        {
          "key_as_string" : "2022-07-01",
          "key" : 1656633600000,
          "doc_count" : 4
        }
      ],
      "interval" : "1M"
    }
  }
}

注意结果中Interval为1M,就是es根据桶数量自动推算出来的.

 

4、Percentiles 饼图统计

指定百分比计算值的范围,分别统计百分之20、百分之40、百分之60、百分之80、百分之99的商品的价格在什么值

GET food/_search?size=0
{
  "aggs": {
    "percentiles_agg": {
      "percentiles": {
        "field": "Price",
        "percents": [
          20,
          40,
          60,
          80,
          99
        ]
      }
    }
  }
}

结果如下:

{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "percentiles_agg" : {
      "values" : {
        "20.0" : 11.109999656677246,
        "40.0" : 28.309999942779555,
        "60.0" : 89.91000061035156,
        "80.0" : 120.10999908447278,
        "99.0" : 300.1099853515625
      }
    }
  }
}

结果显示百分之20的商品价格在11以内,百分之40的价格在28以内,百分之99的价格在300以内.

常用于计算接口的可靠性,假设接口相应在100ms以内,算合格,那么这里的百分之99对应的值,必须在100以内,才算达标,以此类推.

 

5、Percentile ranks 饼图统计

这个和Percentiles相反,两者都是饼图统计的一种,它可以计算指定范围所占的百分比,而Percentiles指定百分比计算范围

GET food/_search?size=0
{
  "aggs": {
    "percentile_ranks_agg": {
       "percentile_ranks": {
         "field": "Price",
         "values": [
           100,
           200,
           300,
           400
         ]
       }
    }
  }
}

结果如下:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "percentile_ranks_agg" : {
      "values" : {
        "100.0" : 71.33613394088104,
        "200.0" : 85.69857243009302,
        "300.0" : 100.0,
        "400.0" : 100.0
      }
    }
  }
}

这里就计算出100以内占百分之71.....

 

 

到这里结束,其余查阅官方文档.

posted @ 2022-08-11 22:59  郑小超  阅读(2078)  评论(0编辑  收藏  举报