以bank account 数据为例,认识elasticsearch query 和 filter

Elasticsearch 查询语言(Query DSL)认识(一)

这里写图片描述

一、基本认识

查询子句的行为取决于

  • query context
  • filter context

也就是执行的是查询(query)还是过滤(filter)

  • query context 描述的是:被搜索的文档和查询子句的匹配程度

  • filter context 描述的是: 被搜索的文档和查询子句是否匹配

一个是匹配程度问题,一个是是否匹配的问题

二、实例

  1. 导入数据 bank account data download
  2. 将数据导入到elasticsearch
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'

这里有两个地方需要注意,1.host要改成符合自己的。2.早期版本中下载的数据可以能是'accounts.json?raw=true'
大概如下 curl -XPOST 'wbelk:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json?raw=true"

  1. 参数认识

为了便捷操作,可以安装一个kiabna sense

$./bin/kibana plugin --install elastic/sense

$./bin/kibana  
sudo -i service restart kibana(或者用这个启动kibana)

match_all 搜索,直接返回所有文档

GET /bank/_search
{
  "query": {
    "match_all": {}
  }
}

返回大致如下:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_type": "account",
        "_id": "25",
        "_score": 1,
        "_source": {
          "account_number": 25,
          "balance": 40540,
          "firstname": "Virginia",
          "lastname": "Ayala",
          "age": 39,
          "gender": "F",
          "address": "171 Putnam Avenue",
          "employer": "Filodyne",
          "email": "virginiaayala@filodyne.com",
          "city": "Nicholson",
          "state": "PA"
        }
      },

参数大致解释:

  • took: 执行搜索耗时,毫秒为单位,也就是本文我1ms
  • time_out: 搜索是否超时
  • _shards: 多少分片被搜索,成功多少,失败多少
  • hits: 搜索结果展示
  • hits.total: 匹配条件的文档总数
  • hits.hits: 返回结果展示,默认返回十个
  • hits.max_score:最大匹配得分
  • hits._score: 返回文档的匹配得分(得分越高,匹配程度越高,越靠前)
  • _index _type _id 作为剥层定位到特定的文档
  • _source 文档源
  1. 查询语言之 执行查询
  • 只显示account_number 和 balance
POST /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_type": "account",
        "_id": "25",
        "_score": 1,
        "_source": {
          "account_number": 25,
          "balance": 40540
        }
      },
      {
        "_index": "bank",
        "_type": "account",
        "_id": "44",
        "_score": 1,
        "_source": {
          "account_number": 44,
          "balance": 34487
        }
      },
      {
        "_index": "bank",
        "_type": "account",
        "_id": "99",
        "_score": 1,
        "_source": {
          "account_number": 99,
          "balance": 47159
        }
      },
  • 返回accountu_number 为20的document
POST /bank/_search
{
  "query": { "match": { "account_number": 20 } }
}
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 5.6587105,
    "hits": [
      {
        "_index": "bank",
        "_type": "account",
        "_id": "20",
        "_score": 5.6587105,
        "_source": {
          "account_number": 20,
          "balance": 16418,
          "firstname": "Elinor",
          "lastname": "Ratliff",
          "age": 36,
          "gender": "M",
          "address": "282 Kings Place",
          "employer": "Scentric",
          "email": "elinorratliff@scentric.com",
          "city": "Ribera",
          "state": "WA"
        }
      }
    ]
  }
}
  • 返回地址中包含(term)mill的所有账户
POST /bank/_search
{
  "query": { "match": { "address": "mill" } }
}
  • 返回地址中包含term 'mill'或者 'lane'的所有账户
POST /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}
  • 匹配phrase 'mill lane'
POST /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
  • 返回address包含'mill'和'lane'的所有账户 (AND)
POST /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
  • 返回address包含'mill'或'lane'的所有账户 (OR)
POST /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
  • 返回address既不包含'mill'也不包含'lane'的所有账户 (NO)
POST /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
  • 返回age为40,并且state不是ID的所有账户 (组合)
POST /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
  1. 查询语言之 执行过滤

过滤不会进行相关度得分的计算

  • 在所有账户中寻找balance 在29900到30000之间(闭区间)的所有账户
    (先查询到所有的账户,然后进行过滤)
POST /bank/_search
{
  "query": {
    "filtered": {
      "query": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 29900,
            "lte": 30000
          }
        }
      }
    }
  }
}
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_type": "account",
        "_id": "243",
        "_score": 1,
        "_source": {
          "account_number": 243,
          "balance": 29902,
          "firstname": "Evangelina",
          "lastname": "Perez",
          "age": 20,
          "gender": "M",
          "address": "787 Joval Court",
          "employer": "Keengen",
          "email": "evangelinaperez@keengen.com",
          "city": "Mulberry",
          "state": "SD"
        }
      },
      {
        "_index": "bank",
        "_type": "account",
        "_id": "781",
        "_score": 1,
        "_source": {
          "account_number": 781,
          "balance": 29961,
          "firstname": "Sanford",
          "lastname": "Mullen",
          "age": 26,
          "gender": "F",
          "address": "879 Dover Street",
          "employer": "Zanity",
          "email": "sanfordmullen@zanity.com",
          "city": "Martinez",
          "state": "TX"
        }
      },
      ...

根据返回结果我们可以看到filter得到的_score为1.不存在程度上的问题。是0和1的问题

三、query和filter效率

一般认为filter的速度快于query的速度

  • filter不会计算相关度得分,效率高
  • filter的结果可以缓存到内存中,方便再用
posted @ 2017-01-06 15:44  杨文波  阅读(1055)  评论(0编辑  收藏  举报