Elasticsearch之索引、文档、组合查询、排序查询、filter过滤操作

Elasticsearch之-索引操作

# es的倒排索引(扩展阅读.md)
-把文章进行分词,对每个词建立索引

具体操作可以查看官方文档

https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices.html>

官方2版本的中文文档

https://www.elastic.co/guide/cn/elasticsearch/guide/current/index-settings.html

一 索引初始化

#新建一个lqz2的索引,索引分片数量为5,索引副本数量为1
PUT lqz2
{
  "settings": {
    "index":{
      "number_of_shards":5,
      "number_of_replicas":1
    }
  }
}
'''
number_of_shards
每个索引的主分片数,默认值是 5 。这个配置在索引创建后不能修改。
number_of_replicas
每个主分片的副本数,默认值是 1 。对于活动的索引库,这个配置可以随时修改。
'''

二 查询索引配置

#获取lqz2索引的配置信息
GET lqz2/_settings
#获取所有索引的配置信息
GET _all/_settings
#同上
GET _settings
#获取lqz和lqz2索引的配置信息
GET lqz,lqz2/_settings

三 更新索引

#修改索引副本数量为2
PUT lqz/_settings
{
  "number_of_replicas": 2
}
#如遇到报错:cluster_block_exception,因为这是由于ES新节点的数据目录data存储空间不足,导致从master主节点接收同步数据的时候失败,此时ES集群为了保护数据,会自动把索引分片index置为只读read-only
PUT  _all/_settings
{
"index": {
  "blocks": {
    "read_only_allow_delete": false
    }
  }
}

四 删除索引

#删除lqz索引
DELETE lqz

Elasticsearch之-文档操作

一 新增文档

#新增一个id为1的书籍(POST和PUT都可以)
POST lqz/_doc/1/_create
#POST lqz/_doc/1
#POST lqz/_doc 会自动创建id,必须用Post
{
  "title":"红楼梦",
  "price":12,
  "publish_addr":{
    "province":"黑龙江",
    "city":"鹤岗"
  },
  "publish_date":"2013-11-11",
  "read_num":199,
  "tag":["古典","名著"]
}

二 查询文档

#查询lqz索引下id为7的文档
GET lqz/_doc/1
#查询lqz索引下id为7的文档,只要title字段
GET lqz/_doc/7?_source=title
#查询lqz索引下id为7的文档,只要title和price字段
GET lqz/_doc/7?_source=title,price
#查询lqz索引下id为7的文档,要全部字段
GET lqz/_doc/7?_source

三 修改文档

#修改文档(覆盖修改,原来的字段就没有了)
PUT lqz/_doc/1
{
  "title":"xxxx",
  "price":333,
  "publish_addr":{
    "province":"黑龙江",
    "city":"福州"
  }
}
#修改文档,增量修改,只修改某个字段(注意是post)(一定要注意包在doc中)
POST lqz/_update/1
{
  "doc":{
    "title":"修改"
  }
}

四 删除文档

#删除文档id为10的
DELETE lqz/_doc/10

五 批量操作之_mget

#批量获取lqz索引_doc类型下id为2的数据和lqz2索引_doc类型下id为1的数据
GET _mget
{
  "docs":[
    {
      "_index":"lqz",
      "_type":"_doc",
      "_id":2
    },
    {
      "_index":"lqz2",
      "_type":"_doc",
      "_id":1
    }
    ]
}
​
#批量获取lqz索引下id为1和2的数据
GET lqz/_mget
{
  "docs":[
    {
      "_id":2
    },
    {
      "_id":1
    }
    ]
}
#同上
GET lqz/_mget
{
  "ids":[1,2]
}

六 批量操作之 bulk

PUT test/_doc/2/_create
{
  "field1" : "value22"
}
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

Elasticsearch之查询的两种方式

一 前言

elasticsearch提供两种查询方式:

  • 查询字符串(query string),简单查询,就像是像传递URL参数一样去传递查询语句,被称为简单搜索或查询字符串(query string)搜索。

  • 另外一种是通过DSL语句来进行查询,被称为DSL查询(Query DSL),DSL是Elasticsearch提供的一种丰富且灵活的查询语言,该语言以json请求体的形式出现,通过restful请求与Elasticsearch进行交互。

二 准备数据

PUT lqz/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["", "", ""]
}
​
PUT lqz/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美,娇憨可爱",
  "tags":["", "",""]
}
​
PUT lqz/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp,没怎么看,不知道怎么形容",
  "tags":["造数据", "",""]
}
​
PUT lqz/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细,狐假虎威",
  "tags":["", "",""]
}
​
PUT lqz/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
  "tags":["闭月","羞花"]
}
View Code

三 查询字符串

GET lqz/doc/_search?q=from:gu

还是使用GET命令,通过_serarch查询,查询条件是什么呢?条件是from属性是gu家的人都有哪些。

结果如下

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细,狐假虎威",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp,没怎么看,不知道怎么形容",
          "tags" : [
            "造数据",
            "",
            ""
          ]
        }
      }
    ]
  }
}
结果如下

我们来重点说下hitshits是返回的结果集——所有from属性为gu的结果集。重点中的重点是_score得分,得分是什么呢?根据算法算出跟查询条件的匹配度,匹配度高得分就高。后面再说这个算法是怎么回事。

四 结构化查询

我们现在使用DSL方式,来完成刚才的查询,查看来自顾家的都有哪些人。

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  }
}

上例,查询条件是一步步构建出来的,将查询条件添加到match中即可,而match则是查询所有from字段的值中含有gu的结果就会返回。  当然结果没啥变化:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细,狐假虎威",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp,没怎么看,不知道怎么形容",
          "tags" : [
            "造数据",
            "",
            ""
          ]
        }
      }
    ]
  }
}
结果如下

Elasticsearch之排序查询sort

降序:desc

比如我们查询顾府都有哪些人,并根据age字段按照降序,并且,我只想看nameage字段:

GET lqz/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

上例,在条件查询的基础上,我们又通过sort来做排序,根据age字段排序,由order字段控制,desc是降序。

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "",
            "",
            ""
          ]
        },
        "sort" : [
          30
        ]
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细,狐假虎威",
          "tags" : [
            "",
            "",
            ""
          ]
        },
        "sort" : [
          29
        ]
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp,没怎么看,不知道怎么形容",
          "tags" : [
            "造数据",
            "",
            ""
          ]
        },
        "sort" : [
          22
        ]
      }
    ]
  }
}
结果如下

上例中,结果是以降序排列方式返回的。

2.2 升序:asc

GET lqz/doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美,娇憨可爱",
          "tags" : [
            "",
            "",
            ""
          ]
        },
        "sort" : [
          18
        ]
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp,没怎么看,不知道怎么形容",
          "tags" : [
            "造数据",
            "",
            ""
          ]
        },
        "sort" : [
          22
        ]
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
          "tags" : [
            "闭月",
            "羞花"
          ]
        },
        "sort" : [
          25
        ]
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细,狐假虎威",
          "tags" : [
            "",
            "",
            ""
          ]
        },
        "sort" : [
          29
        ]
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "",
            "",
            ""
          ]
        },
        "sort" : [
          30
        ]
      }
    ]
  }
}
结果如下

注意:不是什么数据类型都能排序,只有数字,日期可以排序,其他都不行!!

Elasticsearch之分页查询

分页查询:from/size

GET lqz/doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "from": 2,
  "size": 1
}

#上例,首先以`age`降序排序,查询所有。并且在查询的时候,添加两个属性`from`和`size`来控制查询结果集的数据条数。

- from:从哪开始查
- size:返回几条结果

# 有了这个查询,如何分页?
一页有10条数据
第一页:
  "from": 0,
  "size": 10
第二页:
  "from": 10,
  "size": 10
第三页:
  "from": 20,
  "size": 10
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
          "tags" : [
            "闭月",
            "羞花"
          ]
        },
        "sort" : [
          25
        ]
      }
    ]
  }
}
查询结果

Elasticsearch之布尔(组合)查询

多个条件

- must(and- should(or- must_not(not- filter

组合查询之must

# 查询form gu和age=30的数据
GET lqz/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "age": "30"
          }
        }
      ]
    }
  }
}
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.287682,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.287682,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      }
    ]
  }
}
查询结果

注意:所有属性值为列表的,都可以实现多个条件并列存在

组合查询之should

#查询`from`为`gu`或者`tags`为`闭月`的数据
GET lqz/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "闭月"
          }
        }
      ]
    }
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细,狐假虎威",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 0.5753642,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!",
          "tags" : [
            "闭月",
            "羞花"
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp,没怎么看,不知道怎么形容",
          "tags" : [
            "造数据",
            "",
            ""
          ]
        }
      }
    ]
  }
}
查询结果

组合查询之must_not

#查询`from`既不是`gu`并且`tags`也不是`可爱`,还有`age`不是`18`的数据

GET lqz/doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "可爱"
          }
        },
        {
          "match": {
            "age": 18
          }
        }
      ]
    }
  }
}

filter查询

filter条件过滤查询,过滤条件的范围用`range`表示,`gt`表示大于
gt:大于   lt:小于     get:大于等于      let:小于等于

#查询`from`为`gu`,`age`大于`25`的数据

GET lqz/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gt": 25
          }
        }
      }
    }
  }
}
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细,狐假虎威",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      },
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "",
            "",
            ""
          ]
        }
      }
    ]
  }
}
查询结果

小结:

  • must:与关系,相当于关系型数据库中的and

  • should:或关系,相当于关系型数据库中的or

  • must_not:非关系,相当于关系型数据库中的not

  • filter:过滤条件。

  • range:条件筛选范围。

  • gt:大于,相当于关系型数据库中的>

  • gte:大于等于,相当于关系型数据库中的>=

  • lt:小于,相当于关系型数据库中的<

  • lte:小于等于,相当于关系型数据库中的<=

Elasticsearch之查询结果过滤

一 前言

在未来,一篇文档可能有很多的字段,每次查询都默认给我们返回全部,在数据量很大的时候,是的,比如我只想查姑娘的手机号,你一并给我个喜好啊、三围什么的算什么?  所以,我们对结果做一些过滤,清清白白的告诉elasticsearch

二 准备数据

PUT lqz/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["", "", ""]
}

三 结果过滤:_source

现在,在所有的结果中,我只需要查看nameage两个属性,其他的不要怎么办?

GET lqz/doc/_search
{
  "query": {
    "match": {
      "name": "顾老二"
    }
  },
  "_source": ["name", "age"]
}
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.8630463,
    "hits" : [
      {
        "_index" : "lqz",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "name" : "顾老二",
          "age" : 30
        }
      }
    ]
  }
}
查询结果

在数据量很大的时候,我们需要什么字段,就返回什么字段就好了,提高查询效率

 

posted @ 2020-05-07 15:09  Hank·Paul  阅读(2117)  评论(0编辑  收藏  举报