DSL：结构化查询

查询与过滤

空查询

{} - 在功能上等同于使用 match_all 查询子句，正如其名字一样，匹配所有的文档：

GET devicelog_01/_search
{
  "query": {
    "match_all": {}
  }
}

View Code

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 14,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0001",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "000001",
          "OperationDateTime" : 1583321643000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "12"
            },
            {
              "name" : "gg",
              "value" : "24"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0004",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "000004",
          "OperationDateTime" : 1583494443000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "44"
            },
            {
              "name" : "gg",
              "value" : "44"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "1583037243000",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "2020-03-01, 12:34:03",
          "OperationDateTime" : 1583037243000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "2020-03-06, 22:34:03"
            },
            {
              "name" : "gg",
              "value" : "11"
            },
            {
              "dd" : "111",
              "value" : "22"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0002",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "000002",
          "OperationDateTime" : 1583148843000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "11"
            },
            {
              "name" : "gg",
              "value" : "11"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0009",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "000009",
          "OperationDateTime" : 1583148843000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "11"
            },
            {
              "name" : "gg",
              "value" : "11"
            },
            {
              "name" : "yan",
              "value" : "22"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0010",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "000010",
          "OperationDateTime" : 1583148843000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "11"
            },
            {
              "name" : "gg",
              "value" : "11"
            },
            {
              "dd" : "yan",
              "value" : "22"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "1582997643000",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "2020-03-01, 01:34:03",
          "OperationDateTime" : 1582997643000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "2020-03-06, 22:34:03"
            },
            {
              "name" : "gg",
              "value" : "11"
            },
            {
              "dd" : "111",
              "value" : "22"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "1583004843000",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "2020-03-01, 03:34:03",
          "OperationDateTime" : 1583004843000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "2020-03-06, 22:34:03"
            },
            {
              "name" : "gg",
              "value" : "11"
            },
            {
              "dd" : "111",
              "value" : "22"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0003",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "000003",
          "OperationDateTime" : 1583408043000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "55"
            },
            {
              "name" : "gg",
              "value" : "55"
            }
          ]
        }
      },
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0005",
        "_score" : 1.0,
        "_source" : {
          "systemId" : "000005",
          "OperationDateTime" : 1583580843000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "66"
            },
            {
              "name" : "gg",
              "value" : "66"
            }
          ]
        }
      }
    ]
  }
}

View Code

查询子句

查询Items.dd的值是否是yan

GET devicelog_01/_search
{
  "query": {
    "match": {
      "Items.dd": "yan"
    }
  }
}

View Code

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "devicelog_01",
        "_type" : "log",
        "_id" : "0010",
        "_score" : 0.9808292,
        "_source" : {
          "systemId" : "000010",
          "OperationDateTime" : 1583148843000,
          "Items" : [
            {
              "name" : "kk",
              "value" : "11"
            },
            {
              "name" : "gg",
              "value" : "11"
            },
            {
              "dd" : "yan",
              "value" : "22"
            }
          ]
        }
      }
    ]
  }
}

View Code

合并多子句

查询子句就像是搭积木一样，可以合并简单的子句为一个复杂的查询语句，比如：

简单子句(leaf clauses)(比如 match 子句)用以在将查询字符串与一个字段(或多字段)进行比较

复合子句(compound)用以合并其他的子句。例如， bool 子句允许你合并其他的合法子句，无论是 must ， must_not 还是 should ：

查询与过滤

前面我们讲到的是关于结构化查询语句，事实上我们可以使用两种结构化语句：结构化查询（Query DSL）和结构化过滤（Filter DSL）。

查询与过滤语句非常相似，但是它们由于使用目的不同而稍有差异。

一条过滤语句会询问每个文档的字段值是否包含着特定值：

是否 created 的日期范围在 2013 到 2014 ?
是否 status 字段中包含单词 "published" ?
是否 lat_lon 字段中的地理位置与目标点相距不超过10km ?

一条查询语句与过滤语句相似，但问法不同：

查询语句会询问每个文档的字段值与特定值的匹配程度如何？

查询语句的典型用法是为了找到文档：

查找与 full text search 这个词语最佳匹配的文档
查找包含单词 run ，但是也包含 runs , running , jog 或 sprint 的文档

同时包含着 quick , brown 和 fox --- 单词间离得越近，该文档的相关性越高，标识着 lucene , search 或 java --- 标识词越多，该文档的相关性越高

一条查询语句会计算每个文档与查询语句的相关性，会给出一个相关性评分 _score ，并且按照相关性对匹配到的文档进行排序。

这种评分方式非常适用于一个没有完全配置结果的全文本搜索。

性能差异

使用过滤语句得到的结果集 -- 一个简单的文档列表，快速匹配运算并存入内存是十分方便的，每个文档仅需要1个字节。这

些缓存的过滤结果集与后续请求的结合使用是非常高效的。

查询语句不仅要查找相匹配的文档，还需要计算每个文档的相关性，所以一般来说查询语句要比过滤语句更耗时，并且查询

结果也不可缓存。

幸亏有了倒排索引，一个只匹配少量文档的简单查询语句在百万级文档中的查询效率会与一条经过缓存的过滤语句旗鼓相

当，甚至略占上风。但是一般情况下，一条经过缓存的过滤查询要远胜一条查询语句的执行效率。

过滤语句的目的就是缩小匹配的文档结果集，所以需要仔细检查过滤条件。

原则上来说，使用查询语句做全文本搜索或其他需要进行相关性评分的时候，剩下的全部用过滤语句

最重要的查询过滤语句

term 过滤

term 主要用于精确匹配哪些值，比如数字，日期，布尔值或 not_analyzed 的字符串(未经分析的文本数据类型)：

GET devicelog_01/_search
{
  "query": {
    "term": {
      "Items.name": {
        "value": "yan"
      }
    }
  }
}

View Code

terms 过滤

terms 跟 term 有点类似，但 terms 允许指定多个匹配条件。如果某个字段指定了多个值，那么文档需要一起去做匹配：

GET devicelog_01/_search
{
  "query": {
    "terms": {
      "Items.name": [
        "kk",
        "yan"
      ]
    }
  }
}

View Code

range 过滤

range 过滤允许我们按照指定范围查找一批数据：

GET devicelog_01/_search
{
  "query": {
    "range": {
      "OperationDateTime": {
        "gte": 10,
        "lte": 20
      }
    }
  }
}

View Code

范围操作符包含：gt :: 大于 , gte :: 大于等于 , lt :: 小于 , lte :: 小于等于

exists 和 missing 过滤

exists 和 missing 过滤可以用于查找文档中是否包含指定字段或没有某个字段，类似于SQL语句中的 IS_NULL 条件

这两个过滤只是针对已经查出一批数据来，但是想区分出某个字段是否存在的时候使用。

GET devicelog_01/_search
{
  "query": {
    "exists": {
      "field": "Items.dd"
    }
  }
}

View Code

bool 过滤

bool 过滤可以用来合并多个过滤条件查询结果的布尔逻辑，它包含一下操作符：

must :: 多个查询条件的完全匹配,相当于 and 。

must_not :: 多个查询条件的相反匹配，相当于 not 。

should :: 至少有一个查询条件匹配, 相当于 or 。

GET devicelog_01/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {
          "Items.name": {
            "value": "kk"
          }
        }}
      ],
      "must_not": [
        {}
      ],
      "should": [
        {}
      ]
    }
  }
}

View Code

match_all 查询

使用 match_all 可以查询到所有文档，是没有查询条件下的默认语句。

GET devicelog_01/_search
{
  "query": {
    "match_all": {}
  }
}

View Code

此查询常用于合并过滤条件。比如说你需要检索所有的邮箱,所有的文档相关性都是相同的，所以得到的 _score 为1

match 查询

GET devicelog_01/_search
{
  "query": {
    "match": {
      "Items.dd": "yan"
    }
  }
}

View Code

match 查询是一个标准查询，不管你需要全文本查询还是精确查询基本上都要用到它。

如果你使用 match 查询一个全文本字段，它会在真正查询之前用分析器先分析 match 一下查询字符：

如果用 match 下指定了一个确切值，在遇到数字，日期，布尔值或者 not_analyzed 的字符串时，它将为你搜索你给定的值：

提示：做精确匹配搜索时，你最好用过滤语句，因为过滤语句可以缓存数据。

multi_match 查询

multi_match 查询允许你做 match 查询的基础上同时搜索多个字段：

bool 查询

bool 查询与 bool 过滤相似，用于合并多个查询子句。不同的是， bool 过滤可以直接给出是否匹配成功，而 bool 查询

要计算每一个查询子句的 _score （相关性分值）。

must :: 查询指定文档一定要被包含。

must_not :: 查询指定文档一定不要被包含。

should :: 查询指定文档，有则可以为文档相关性加分。

以下查询将会找到 title 字段中包含 "how to make millions"，并且 "tag" 字段没有被标为 spam 。如果有标识为 "starred"

或者发布日期为2014年之前，那么这些匹配的文档将比同类网站等级高：

以下查询将会找到 title 字段中包含 "how to make millions"，并且 "tag" 字段没有被标为 spam 。

如果有标识为 "starred"或者发布日期为2014年之前，那么这些匹配的文档将比同类网站等级高：

GET devicelog_01/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "how to make millions"
        }
      },
      "must_not": {
        "match": {
          "tag": "spam"
        }
      },
      "should": [
        {
          "match": {
            "tag": "starred"
          }
        },
        {
          "range": {
            "date": {
              "gte": "2014-01-01"
            }
          }
        }
      ]
    }
  }
}

View Code

提示：如果 bool 查询下没有 must 子句，那至少应该有一个 should 子句。但是如果有 must 子句，那么没有 should 子句也可以进行查询。

查询与过滤条件的合并

查询语句和过滤语句可以放在各自的上下文中。在 ElasticSearch API 中我们会看到许多带有 query 或 filter 的语句。

这些语句既可以包含单条 query 语句，也可以包含一条 filter 子句。换句话说，这些语句需要首先创建一个 query 或 filter 的上下文关系。

复合查询语句可以加入其他查询子句，复合过滤语句也可以加入其他过滤子句。通常情况下，一条查询语句需要过滤语句的辅助，全文本搜索除外。

所以说，查询语句可以包含过滤子句，反之亦然。以便于我们切换 query 或 filter 的上下文。这就要求我们在读懂需求的同时构造正确有效的语句。

过滤一条查询语句

比如说我们有这样一条查询语句:

然后我们想要让这条语句加入 term 过滤，在收信箱中匹配邮件：

search API中只能包含 query 语句，所以我们需要用 filtered 来同时包含 "query" 和 "filter" 子句：

我们在外层再加入 query 的上下文关系：

GET devicelog_01/_search
{
  "query": {
    "filtered": {
      "query": {
        "match": {
          "email": "business opportunity"
        }
      },
      "filter": {
        "term": {
          "folder": "inbox"
        }
      }
    }
  }
}

View Code

单条过滤语句

在 query 上下文中，如果你只需要一条过滤语句，比如在匹配全部邮件的时候，你可以省略 query 子句：

GET devicelog_01/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "folder": "inbox"
        }
      }
    }
  }
}

View Code

如果一条查询语句没有指定查询范围，那么它默认使用 match_all 查询，所以上面语句的完整形式如下：

GET devicelog_01/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "folder": "inbox"
        }
      }
    }
  }
}

View Code

查询语句中的过滤

有时候，你需要在 filter 的上下文中使用一个 query 子句。下面的语句就是一条带有查询功能的过滤语句，这条语句可以过滤掉看起来像垃圾邮件的文档：

验证查询

查询语句可以变得非常复杂，特别是与不同的分析器和字段映射相结合后，就会有些难度。

validate API 可以验证一条查询语句是否合法。

GET devicelog_01/_validate/query
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "folder": "inbox"
        }
      }
    }
  }
}

View Code

想知道语句非法的具体错误信息，需要加上 explain 参数：

GET devicelog_01/_validate/query?explain
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "folder": "inbox"
        }
      }
    }
  }
}

View Code

如果是合法语句的话，使用 explain 参数可以返回一个带有查询语句的可阅读描述，可以帮助了解查询语句在ES中是如何执行的

posted @ 2020-04-05 10:38 弱水三千12138 阅读(246) 评论(0) 收藏举报

刷新页面返回顶部

弱水三千12138

格言：纸上得来终觉浅 绝知此事要躬行

DSL：结构化查询

查询与过滤

最重要的查询过滤语句

查询与过滤条件的合并

验证查询

格言：纸上得来终觉浅绝知此事要躬行