Filter search result

Post filter

数据初始化：

PUT /shirts
{
  "mappings": {
    "properties": {
      "brand": { "type": "keyword"},
      "color": { "type": "keyword"},
      "model": { "type": "keyword"}
    }
  }
}
PUT /shirts/_doc/1?refresh
{
  "brand": "gucci",
  "color": "red",
  "model": "slim"
}
PUT /shirts/_doc/2?refresh
{
  "brand": "gucci",
  "color": "black",
  "model": "slim"
}
PUT /shirts/_doc/3?refresh
{
  "brand": "gucci",
  "color": "green",
  "model": "slim"
}
PUT /shirts/_doc/4?refresh
{
"brand": "gucci",
"color": "white",
"model": "hat"
}

PUT /shirts/_doc/5?refresh
{
"brand": "gucci",
"color": "red",
"model": "hat"
}

View Code

我们创建了一个索引shirts，映射字段：brand品牌、color颜色、类型（衬衫、帽子），并初始化5条数据（3个衬衫红、绿、黑；2个帽子红、白）。

查询：

GET /shirts/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": { "brand": "gucci" } 
      }
    }
  },
  "aggs": {
    "colors": {
      "terms": { "field": "color" } 
    },
    "color_red": {
      "filter": {
        "term": { "color": "red" } 
      },
      "aggs": {
        "models": {
          "terms": { "field": "model" } 
        }
      }
    }
  },
  "post_filter": { 
    "term": { "color": "red" }
  }
}

View Code

在上面的查询中：

1、我们查询 brand=gucci 的商品。并使用postfilter对查询结果进行过滤，只返回颜色为red的商品，结果我们得到了2件商品：1件红色的衬衫，1顶红色的帽子。

2、同时，我们定义了2个聚合：

colors：返回所有颜色商品的数量。

color_red：根据model分组，返回每种分组下红色商品的数量。

返回结果：

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "shirts",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "brand" : "gucci",
          "color" : "red",
          "model" : "slim"
        }
      },
      {
        "_index" : "shirts",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.0,
        "_source" : {
          "brand" : "gucci",
          "color" : "red",
          "model" : "hat"
        }
      }
    ]
  },
  "aggregations" : {
    "color_red" : {
      "doc_count" : 2,
      "models" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "hat",
            "doc_count" : 1
          },
          {
            "key" : "slim",
            "doc_count" : 1
          }
        ]
      }
    },
    "colors" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "red",
          "doc_count" : 2
        },
        {
          "key" : "black",
          "doc_count" : 1
        },
        {
          "key" : "green",
          "doc_count" : 1
        },
        {
          "key" : "white",
          "doc_count" : 1
        }
      ]
    }
  }
}

View Code

我们看到返回结果中：

查询结果返回了：

2件商品（1件红色的衬衫，1顶红色的帽子）。
colors聚合返回了4种颜色，及每种颜色的商品数量
color_red聚合返回了2种商品，及这2种商品下对应的商品数量。

Rescore filtered search results（对查询结果重新打分）

rescore可以通过对query和post_filter的结果，重新进行排序，从而提高查询结果的精度。因为只是对结果重新排序，所以速度肯定也比对所有查询结果进行排序要快。es在每个碎片上执行一个rescore请求，然后它返回其结果，再由处理整体搜索请求的节点排序。目前，rescore API只有一个实现：rescore，它与query结合使用来调整评分。rescore的使用示例如下：

POST /_search
{
   "query" : {
      "match" : {
         "message" : {
            "operator" : "or",
            "query" : "the quick brown"
         }
      }
   },
   "rescore" : {
      "window_size" : 50,
      "query" : {
         "rescore_query" : {
            "match_phrase" : {
               "message" : {
                  "query" : "the quick brown",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }
}

View Code

rescore只对query和post_filter阶段，返回的Top-K个结果执行二次打分。每个碎片上要检查的文档数量，可以由window_size参数控制，该参数默认为10。

默认情况下，原始查询和rescore查询的分数会线性组合，为每个文档生成最终的_score。原始查询和rescore查询的相对重要性，可以分别通过query_weight和rescore_query_weight来控制。两者的默认值都为1。

　　　　　　　　　　　分数的组合方式可以通过score_mode来控制:

Score Mode	Description
`total（默认）`	将原始评分和rescore分数相加。
`multiply`	将原始评分和rescore分数相乘. 这种模式对 `function query` 非常有用。
`avg`	取原始评分和rescore分数的平均数。
`max`	取原始评分和rescore分数的最大值。
`min`	取原始评分和rescore分数的最小值。

Multiply rescore（在查询中使用多个rescore）

第一个rescore，获取query的结果，第二个rescore获取第一个rescore的结果，以此类推。第二个rescore将“看到”由第一个rescore完成的排序，因此可以在第一个rescore上使用一个大窗口（比如100条数据）将文档拉到第二个rescore的一个更小的窗口中（比如10条数据）。使用示例如下：

POST /_search
{
   "query" : {
      "match" : {
         "message" : {
            "operator" : "or",
            "query" : "the quick brown"
         }
      }
   },
   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match_phrase" : {
               "message" : {
                  "query" : "the quick brown",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": {
                    "source": "Math.log10(doc.count.value + 2)"
                  }
               }
            }
         }
      }
   } ]
}

View Code

posted @ 2022-10-21 13:49 水果小虫阅读(236) 评论(0) 收藏举报

刷新页面返回顶部

Filter search result

Post filter

查询：

返回结果：

Rescore filtered search results（对查询结果重新打分）

公告