Loading

Elasticsearch复杂搜索(排序、分页、高亮、模糊查询、精确查询)

如果不了解Es的基本使用,可以查看之前的文章。Elasticsearch 索引及文档的基本操作

在查询之前可以使用Bulk API 批量插入文档数据 数据来源

查询数据

match query

match会使用分词器解析!先分析文档,然后再通过分析的文档进行查询。

GET /student/_search
{
  "query": {
    "match": {
      "name": "山西"
    }
  }
}

上面的搜索也可以这么实现

GET /student/_search?q=name:"山西"

查询结果展示有三个名字中包含 “山西” 的学生:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.7133499,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西太原-张三",
          "age" : "23",
          "address" : {
            "city" : "太原",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西长治-李四",
          "age" : "24",
          "address" : {
            "city" : "长治",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西吕梁-王五",
          "age" : "25",
          "address" : {
            "city" : "吕梁",
            "province" : "山西"
          }
        }
      }
    ]
  }
}

描述

query : 表示查询。

match : 要匹配的条件信息。

name :要查询的信息

hits --> total

  • value : 查询出两条数据
  • ralation : 关系是 eq,相等

max_source : 最大分值

hits : 索引和文档的信息,查询出来的结果总数,就是查询出来的具体文档。

我们可以根据每个文档的 _source 来判断那条数据更加符合预期结果。

在使用mutch查询时,默认的操作是 OR,下面两个查询的结果是相同的:

GET student/_search
{
    "query": {
        "match": {
            "name": {
                "query": "山西长治",
                "operator": "or"
            }
        }
    }
}
GET student/_search
{
    "query": {
        "match": {
            "name": "山西长治"
        }
    }
}

因为在使用mutch操作时,operator 默认值为 OR,上面的查询为只要任何文档匹配 :山西长治 其中任何一个字将被显示。

可以通过设置 minimum_should_match 参数来设置至少匹配的term,比如:

GET student/_search
{
    "query": {
        "match": {
            "name": {
                "query": "山西长治",
                "operator": "or",
                "minimum_should_match": 3
            }
        }
    }
}

只有匹配到 山西长治 这四个字其中的三个字的文档才会被显示。

改为 and 之后,只有一个文档会被查询到:

GET student/_search
{
  "query": {
    "match": {
      "name": {
        "query": "山西长治",
        "operator": "and"
      }
    }
  }
}

Ids query

使用多个id批量查询文档

GET student/_search
{
  "query": {
    "ids": {
      "values": [1,2,3]
    }
  }
}

上面的查询将返回 id 为 1,2,3的文档。

multi_match

multi_match 查询建立在 match 查询的基础上,允许多字段查询。

在上面的搜索中,通过指定一个 field 来进行搜索。在很多情况下,并不知道那个 field 含有要查询的关键字,这种情况就可以使用 multi_match 来查询。

GET student/_search
{
    "query": {
        "multi_match": {
            "query": "山西长治",
            "fields": [
                "name",
                "address.city^3",
                "address.province"
            ],
            "type": "best_fields"
        }
    }
}

将field:name、city、province 进行检索,并对 city 中含有 山西长治 的文档的分数进行三倍加权。返回结果为:

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 3,
            "relation" : "eq"
        },
        "max_score" : 7.223837,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 7.223837,
                "_source" : {
                    "name" : "山西长治-李四",
                    "age" : "24",
                    "address" : {
                        "city" : "长治",
                        "province" : "山西"
                    }
                }
            },
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 0.7133499,
                "_source" : {
                    "name" : "山西太原-张三",
                    "age" : "23",
                    "address" : {
                        "city" : "太原",
                        "province" : "山西"
                    }
                }
            },
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 0.7133499,
                "_source" : {
                    "name" : "山西吕梁-王五",
                    "age" : "25",
                    "address" : {
                        "city" : "吕梁",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

Prefix query

返回在提供的字段中返回包含特定前缀的文档

GET student/_search
{
    "query": {
        "prefix": {
            "address.city": {
                "value": "吕"
            }
        }
    }
}

查询城市开头为 的文档

{
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 1.0,
                "_source" : {
                    "name" : "山西吕梁-王五",
                    "age" : "25",
                    "address" : {
                        "city" : "吕梁",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

Term query

term 会在给定字段中进行精确的字段匹配,因此需要提供准确的查询条件以获取正确的结果

GET /student/_search
{
    "query": {
        "term": {
            "name.keyword": "山西太原-张三"
        }
    }
}

这里使用 name.keyword 来对 "山西太原-张三" 这个条件进行精确查询匹配文档:

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 1.2039728,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 1.2039728,
                "_source" : {
                    "name" : "山西太原-张三",
                    "age" : "23",
                    "address" : {
                        "city" : "太原",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

Terms query

如果想用对个值进行精确查询,可以使用terms进行查询。类似于 SQL中的 in 语法

GET student/_search
{
    "query": {
        "terms": {
            "address.city.keyword": [
                "长治",
                "广州"
            ]
        }
    }
}

上面的查询结果将展示 address.city.keyword 里含有 长治和广州 的所有文档。

复合查询

复合查询是将上面的单个查询组合起来形成更复杂的查询。

一般格式为:

POST _search
{
    "query": {
        "bool" : {
            "must" : {
                "term" : { "user" : "kimchy" }
            },
            "filter": {
                "term" : { "tag" : "tech" }
            },
            "must_not" : {
                "range" : {
                    "age" : { "gte" : 10, "lte" : 20 }
                }
            },
            "should" : [
                { "term" : { "tag" : "wow" } },
                { "term" : { "tag" : "elasticsearch" } }
            ],
            "minimum_should_match" : 1,
            "boost" : 1.0
        }
    }
}

复合查询是由 bool 下面的 must filter must_not should 组成,并且可以通过 minimum_should_match 来指定文档必须匹配的数量或者百分比。如果布尔查询包含至少一个 should 子句,并且没有 must 或 filter 子句,则默认值为1。否则,默认值为0。

must

must 相当于SQL中的 and 操作。

使用复合查询城市为长治,年龄为24的文档数据

GET student/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "address.city": "长治"
                    }
                },
                {
                    "match": {
                        "age": "24"
                    }
                }
            ]
        }
    }
}

must_not

查询所有省份不在山西的文档,返回结果只剩下了一个广州:

GET student/_search
{
    "query": {
        "bool": {
            "must_not": [
                {
                    "match": {
                        "address.province": "山西"
                    }
                }
            ]
        }
    }
}

filter

使用filter过滤年龄在24~25之间的文档

GET student/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "age": {
              "gte": 24,
              "lte": 25
            }
          }
        }
      ]
    }
  }
}
  • gt : 大于
  • gte : 大于等于
  • lt:小于
  • lte:小于等于

should

should 表示或的意思,相当于SQL中的 OR。

查询省份在山西的文档,如果name含有张三,相关性会更高,搜索结果会靠前。

GET student/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address.province": "山西"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "name": "李四"
          }
        }
      ]
    }
  }
}

返回结果可以看到 name为 山西长治-李四 的文档排在最前:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 3.1212955,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.1212955,
        "_source" : {
          "name" : "山西长治-李四",
          "age" : "24",
          "address" : {
            "city" : "长治",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西太原-张三",
          "age" : "23",
          "address" : {
            "city" : "太原",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西吕梁-王五",
          "age" : "25",
          "address" : {
            "city" : "吕梁",
            "province" : "山西"
          }
        }
      }
    ]
  }
}

通配符查询

使用 wildcard 查询一个字符串中包含的字符,相当于SQL中的 like

GET student/_search
{
    "query": {
        "wildcard": {
            "name": {
                "value": "*王"
            }
        }
    }
}

查询结果为:

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 1.0,
                "_source" : {
                    "name" : "山西吕梁-王五",
                    "age" : "25",
                    "address" : {
                        "city" : "吕梁",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

分页及排序

查询省份为山西的文档,按照年龄倒序排列并分页展示

GET student/_search
{
    "query": {
        "match": {
            "address.province": "山西"
        }
    },
    "sort": [
        {
            "age.keyword": {
                "order": "desc"
            }
        }
    ],
    "from": 2,
    "size": 2
}

from : 起始页,下标从0开始。

size : 每页显示多少条

高亮查询

使用 highlight 高亮查询并且自定义高亮字段。并通过 pre_tagspost_tags 修改高亮文本前后缀。

GET student/_search
{
    "query": {
        "match": {
            "name": "张三"
        }
    },
    "highlight": {
        "pre_tags": "<br>", 
        "post_tags": "</br>", 
        "fields": {
            "name": {}
        }
    }
}

返回结果

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 2.4079456,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 2.4079456,
                "_source" : {
                    "name" : "山西太原-张三",
                    "age" : 23,
                    "address" : {
                        "city" : "太原",
                        "province" : "山西"
                    }
                },
                "highlight" : {
                    "name" : [
                        "山西太原-<br>张</br><br>三</br>"
                    ]
                }
            }
        ]
    }
}
If you’re going to reuse code, you need to understand that code!
posted @ 2021-05-20 17:18  不颓废青年  阅读(268)  评论(0编辑  收藏  举报