本文介绍了 Elasticsearch 全文检索的实践案例,包括索引创建、测试数据准备及多种查询方法。 - 实践

1.全文检索

1.1 准备测试数据

创建一个索引。

PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "text", "fields": {
"keyword": {
"type": "keyword"
}
}
},
"price": {
"type": "double"
},
"category": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"description": {
"type": "text"
},
"stock": {
"type": "integer"
},
"sku": {
"type": "keyword"
},
"created_at": {
"type": "date"
},
"metadata": {
"type": "object"
}
}
}
}

插入测试数据。

POST /products/_bulk
{
"index":{
}
}
{
"name":"Laptop X1","price":1299.99,"category":"electronics","tags":["new","sale"],"description":"High performance laptop","stock":50,"sku":"LP-X1-2023","created_at":"2023-01-15","metadata":{
"weight":1.5,"color":"silver"
}
}
{
"index":{
}
}
{
"name":"Smartphone S10","price":899.99,"category":"electronics","tags":["new","popular"],"description":"Latest smartphone model","stock":120,"sku":"SP-S10-2023","created_at":"2023-02-20","metadata":{
"weight":0.3,"color":"black"
}
}
{
"index":{
}
}
{
"name":"Wireless Headphones","price":199.99,"category":"audio","tags":["sale","popular"],"description":"Noise cancelling headphones","stock":75,"sku":"WH-200-2022","created_at":"2022-11-10","metadata":{
"weight":0.25,"color":"white"
}
}
{
"index":{
}
}
{
"name":"Smart Watch","price":249.99,"category":"wearables","tags":["new","featured"],"description":"Fitness tracking smartwatch","stock":30,"sku":"SW-500-2023","created_at":"2023-03-05","metadata":{
"weight":0.1,"color":"black"
}
}
{
"index":{
}
}
{
"name":"4K TV","price":1499.99,"category":"electronics","tags":["premium","large"],"description":"55-inch 4K television","stock":15,"sku":"TV-4K-55-2023","created_at":"2023-01-25","metadata":{
"weight":18.5,"color":"black"
}
}
{
"index":{
}
}
{
"name":"Bluetooth Speaker","price":129.99,"category":"audio","tags":["portable"],"description":"Waterproof bluetooth speaker","stock":60,"sku":"BS-100-2022","created_at":"2022-12-15","metadata":{
"weight":0.8,"color":"blue"
}
}
{
"index":{
}
}
{
"name":"Gaming Mouse","price":79.99,"category":"accessories","tags":["gaming"],"description":"High DPI gaming mouse","stock":90,"sku":"GM-X200","created_at":"2023-02-01","metadata":{
"weight":0.12,"color":"rgb"
}
}
{
"index":{
}
}
{
"name":"External SSD 1TB","price":159.99,"category":"storage","tags":["fast","reliable"],"description":"Portable SSD drive","stock":45,"sku":"ESSD-1TB-2023","created_at":"2023-03-10","metadata":{
"weight":0.05,"color":"gray"
}
}
{
"index":{
}
}
{
"name":"Keyboard Pro","price":109.99,"category":"accessories","tags":["ergonomic"],"description":"Mechanical keyboard","stock":25,"sku":"KB-PRO-2023","created_at":"2023-03-15","metadata":{
"weight":1.1,"color":"black"
}
}
{
"index":{
}
}
{
"name":"Tablet T8","price":499.99,"category":"electronics","tags":["new","portable"],"description":"10-inch tablet","stock":40,"sku":"TAB-T8-2023","created_at":"2023-02-28","metadata":{
"weight":0.5,"color":"silver"
}
}
{
"index":{
}
}
{
"name":"Camera DSLR","price":899.99,"category":"photography","tags":["professional"],"description":"24MP DSLR camera","stock":20,"sku":"CAM-DSLR-24","created_at":"2023-01-10","metadata":{
"weight":0.7,"color":"black"
}
}
{
"index":{
}
}
{
"name":"Monitor 27\"","price":299.99,"category":"electronics","tags":["office"],"description":"27-inch office monitor","stock":35,"sku":"MON-27-2023","created_at":"2023-02-15","metadata":{
"weight":4.2,"color":"black"
}
}

在这里插入图片描述

1.2 案例分析

1.2.1 match(分词检索)

对字段进行分词后匹配,支持模糊匹配和运算符。

GET /products/_search
{
"query": {
"match": {
"description": {
"query": "niose cancelling", // 故意拼错 "noise" 测试模糊匹配
"fuzziness": "AUTO"
}
}
}
}

在这里插入图片描述

1.2.2 match_phrase(短语检索)

要求词语按顺序完整出现,可设置 slop,允许中间有其他词。

GET /products/_search
{
"query": {
"match_phrase": {
"description": {
"query": "high laptop",
"slop": 1 // 允许中间有 1 个其他词
}
}
}
}

在这里插入图片描述

1.2.3 match_phrase_prefix(短语前缀匹配)

短语匹配,但最后一个词支持前缀匹配。

GET /products/_search
{
"query": {
"match_phrase_prefix": {
"name": {
"query": "Smart Wa", // 匹配 "Smart Watch" 等
"max_expansions": 10 // 限制扩展数量
}
}
}
}

在这里插入图片描述

1.2.4 multi_match(多字段匹配)

multi_match 检索适用于在多个字段上执行 match 检索的场景。它提供了一种方便的方法来在多个字段中间同时搜索指定的关键词,从而实现跨字段的高效检索。通过使用 multi_match 检索,用户可以简化复杂的多字段查询,优化搜索体验,并确保结果满足各种检索需求。

GET /products/_search
{
"query": {
"multi_match": {
"query": "portable",
"fields": ["name", "description", "tags"],
"type": "best_fields"
}
}
}

由于涉及的字段不止一个,multi_match 检索在处理结果评分时采用特殊的评分机制,包括 most_fieldsbest_fieldscross_fields 等评分方式。这些评分方式确定了如何对每个字段获取的分数进行整合。

在这里插入图片描述

为了强调 tags 字段在搜索结果中的重要性,我们使用 ^3 来提高其权重。这意味着匹配 tags 字段的文档具有更高的相关性分数。

GET /products/_search
{
"query": {
"multi_match": {
"query": "portable",
"fields": ["name", "description", "tags^3"],
"type": "best_fields"
}
}
}

在这里插入图片描述

1.2.5 query_string(高级查询语法)

支持 Lucene 查询语法,功能强大但较复杂。

例如:查找在 namedescription 字段中包含 laptopsmartphone,并且 price 字段值在 100 100 100 1000 1000 1000 之间的所有产品文档。

GET /products/_search
{
"query": {
"query_string": {
"query": "(laptop OR smartphone) AND price:[100 TO 1000]",
"fields": ["name", "description"],
"default_operator": "AND"
}
}
}

在这里插入图片描述

1.2.6 simple_query_string

更简单的语法,对用户输入更友好,容错性更好。

例如,搜索同时满足以下条件的产品:

  • 仅在商品名称(name)和描述(description)字段中搜索。
  • 必须包含 speaker(由 +speaker 表示)。
  • 必须不包含 blue(由 -blue 表示)。
  • 最好包含 waterproof(没有前缀符号,作为可选条件)。
GET /products/_search
{
"query": {
"simple_query_string": {
"query": "waterproof +speaker -blue",
"fields": ["name", "description"],
"default_operator": "AND"
}
}
}
  • + 必须包含,- 必须不包含。
  • "default_operator": "AND" 表示当有多个搜索词时(没有 + / - 前缀的词),默认使用 AND 逻辑。
    • AND 操作符:提高精确度(结果更少但更相关)。
    • OR 操作符:提高召回率(结果更多但可能包含不相关项)。

如果用 SQL 表示,类似于:

SELECT * FROM products
WHERE (name LIKE '%speaker%' OR description LIKE '%speaker%')
AND (name NOT LIKE '%blue%' AND description NOT LIKE '%blue%')
AND (name LIKE '%waterproof%' OR description LIKE '%waterproof%')

在这里插入图片描述

注意:虽然 metadata.color 包含 blue,但没有检查 metadata.color 的内容,所以会按照上述内容返回。

如果真正目的是排除蓝色产品,应该这样查询:

GET /products/_search
{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "waterproof +speaker",
"fields": ["name", "description"]
}
},
"must_not": {
"term": {
"metadata.color": "blue"
}
}
}
}
}

在这里插入图片描述

1.3 对比总结表

查询类型特点适用场景语法复杂度
match基本分词匹配,支持模糊常规搜索
match_phrase精确短语匹配引号搜索、固定短语
match_phrase_prefix短语+最后词前缀自动补全
multi_match多字段搜索跨字段搜索
query_string完整查询语法高级搜索界面
simple_query_string简化语法用户直接输入

2.组合检索

  • must:查询结果必须满足指定条件。
  • must_not:查询结果必须不满足指定条件。在此情况下,召回的数据评分为 0 0 0,且不考虑评分。
  • filter:过滤条件,同样不考虑评分,召回的数据评分为 0 0 0。使用 filter 可以借助缓存机制提高查询性能。
  • should:查询结果可以满足的部分条件,具体满足条件的最小数量由 minimum_should_match 参数控制。

? Elasticsearch 查询语句中的 queryfilter 具有不同的用途。

  • query 用于评估文档相关性,并对结果进行评分,通常用于搜索场景。
  • filter 用于筛选文档,不会对文档评分,通常用于过滤场景。

业务要求:查找符合以下条件的相关产品,其中:

  • 必须在 categorydescription 中包含 electronics
  • 优先显示以下产品:
    • description 中提到 high performance 的产品。
    • 被标记为 popular 的产品。
    • 同时满足多个加分条件的产品会排名更靠前。
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "electronics",
"fields": ["category^2", "description"],
"type": "most_fields"
}
}
],
"should": [
{
"match_phrase": {
"description": {
"query": "high performance",
"slop": 2
}
}
},
{
"match": {
"tags": {
"query": "popular"
}
}
}
],
"minimum_should_match": 1
}
}
}

在这里插入图片描述

posted @ 2025-07-28 17:36  yjbjingcha  阅读(18)  评论(0)    收藏  举报