ElasticSearch的基本操作

CRUD

索引初始化操作，指定分片和副本数量，shards一旦指定就不能修改。
PUT lagou2
{
"settings":{
"index":{
"number_of_shards":5,
"number_of_replicas":1
}
}
}

获取settings的方式，其中方式2和方式4是一样的。
GET lagou2/_settings
GET _all/_settings
GET lagou,lagou2/_settings
GET _settings

插入文档，会自动生成一个uuid
POST lagou2/job
{
"title":"python django开发发工程师",
"salary":20000,
"city":"ShangHai",
"company":{
"name":"美团科技",
"company_addr":"软件科技园"
},
"publish_time":"2019-4-16",
"comments":20
}
插入文档，指定id为1的
POST lagou2/job/1
{
"title":"python 分布式爬虫开发",
"salary":15000,
"city":"北京",
"company":{
"name":"百度科技",
"company_addr":"软件科技园"
},
"publish_time":"2019-4-16",
"comments":15
}

获取指定字段
GET lagou2/job/1?_source=title
GET lagou2/job/1?_source=title,city
获取全部字段
GET lagou2/job/1?_source

修改文档，这里是完全覆盖修改
PUT lagou2/job/1
{
"title":"python 分布式爬虫开发",
"salary":15000,
"company":{
"name":"百度科技",
"company_addr":"软件科技园"
},
"publish_time":"2019-4-16",
"comments":15
}

指定字段，非覆盖修改，推荐使用
POST lagou2/job/1/_update
{
"doc":{
"comments":2000
}
}

删除
DELETE lagou2/job/1
DELETE lagou2/job
DELETE lagou2

批量操作 bulk

添加测试数据
POST testdb/job1/1
{
"title":"job1_1"
}

POST testdb/job1/2
{
"title":"job1_2"
}

POST testdb/job2/1
{
"title":"job2_1"
}

POST testdb/job2/2
{
"title":"job2_2"
}

_mget的简单使用：
查询2个表下的2个id，_index是可以不一样的
GET _mget
{
"docs":[
{"_index":"testdb",
"_type":"job1",
"_id":1
},
{
"_index":"testdb",
"_type":"job2",
"_id":2
}
]
}

同一个数据库的时候可以不用指定_index
GET testdb/_mget
{
"docs":[
{
"_type":"job1",
"_id":1
},
{
"_type":"job2",
"_id":2
}
]
}

如果type也一样只有id不一样的时候
GET testdb/job1/_mget
{
"docs":[
{
"_id":1
},
{
"_id":2
}
]
}
其实如果只获取id的话还可以简化
GET testdb/job1/_mget
{
"ids":[1,2]
}

bulk的批量操作：
批量导入可以合并多个操作，比如index，delete，update，create等等。也可以从一个索引导入到另一个索引当中。
书写格式：
action_and_meta_data\n
option_source\n
...
action_and_meta_data\n
option_source\n

举例子：说明这个index操作针对哪个库哪个表哪个字段
{"index":{"_index":"test","_type":"type1","_id":"1"}}
{"field1":"value1"}

{"delete":{"_index":"test","_type":"type1","_id":"2"}}

{"create":{"_index":"test","_type":"type1","_id":"3"}}
{"field1":"value3"}

{"update":{"_id":"1","_type":"type1","_index":"index1"}}
{"doc":{"field2":"value2"}}

现在用2条数据做一次bulk操作，这里插入的数据格式一定要放到1行，不然es会报错。
POST _bulk
{"index":{"_index":"lagou2","_type":"job","_id":"1"}}
{"title":"python分布式爬虫开发","salary_min":15000,"city":"北京","company":{"name":"百度","company_addr":"北京软件园"},"publish_date":"2017-4-16","comments":20}
{"index":{"_index":"lagou2","_type":"job2","_id":"2"}}
{"title":"python django开发","salary_min":20000,"city":"成都","company":{"name":"阿里巴巴","company_addr":"软件园"},"publish_date":"2017-4-16","comments":20}

映射Mapping

映射就是当创建索引的时候，可以预先定义字段的类型和相关属性
映射是创建在type（表）上面的。

ES会根据JSON源数据的基础类型猜测想要的字段映射。
将输入的数据类型转变成可搜索的索引项。
Mapping就是我们自己定义的字段的数据类型，同时告诉ES如何索引数据以及是否可以被搜索。

类型：静态映射和动态映射

内置内类
string类型：text，keyword（不会分词，不会倒排索引）
数字类型：long,short,integer,byte,double,float
日期类型：date
bool类型：boolean
binary类型：binary
复杂类型：object，nested
goe类型：goe-point，goe-shape
专业类型：ip，competion（搜索建议的）

比如这个里面的company就是一个object类型
{
"title":"python 分布式爬虫开发",
"salary":15000,
"company":{
"name":"百度科技",
"company_addr":"软件科技园"
},
"publish_time":"2019-4-16",
"comments":15
}
如果在加入员工emplyee列表,这个就是nested类型
{
"title":"python 分布式爬虫开发",
"salary":15000,
"company":{
"name":"百度科技",
"company_addr":"软件科技园",
"emplyee":[
{"name":"zhangsan","age":18},
{"name":"lisi","age":20}
]
},
"publish_time":"2019-4-16",
"comments":15
}

常用属性和类型

属性	描述	适合类型
store	yes表示存储，no表示不存储，默认是no	all
index	yes表示分析，弄表示不分析，默认是true	string
null_value	如果字段是空，可以设置一个默认值，比如“NA”	all
analyzer	可以设置索引和搜索时用的分析器，默认使用的是standard分析器，还可以使用whitespace，simple，english。ik分析器针对中文	all
include_in_all	默认es为每一个文档定义一个特殊域_all,它的作用是让每一个字段被搜索到，如果不想让某个字段被搜索到，可以设置false	all
format	时间格式字符串的模式	date

示例：text类型的会取出来词，做分词，再做倒排索引，keyword对应的词是不能做分词的,type设置好之后就不能修改了

创建映射mappings
PUT lagou2
{
"mappings":{
"job":{
"properties":{
"title":{
"type":"text"
},
"salary_min":{
"type":"integer"
},
"city":{
"type":"keyword"
},
"company":{
"properties":{
"name":{
"type":"text"
},
"company_addr":{
"type":"text"
},
"employee_count":{
"type":"integer"
}
}
},
"publish_date":{
"type":"date",
"format":"yyyy-MM-dd"
},
"comments":{
"type":"integer"
}
}
}
}
}

# 放入数据
PUT lagou2/job/2
{
"title":"python分布式爬虫开发",
"salary_min":15000,
"city":"北京",
"company":{
"name":"百度",
"company_addr":"北京软件园",
"employee_count":50
},
"publish_date":"2017-4-18",
"comments":15
}

# 获取mappings
GET lagou2/_mapping/job
GET _all/_mapping

ES查询

查询分类：
基本查询：使用es内置查询条件进行查询（参与打分）
组合查询：把多个查询组合在一起进行复合查询（参与打分）
过滤：查询同时，通过filter条件在不影响打分的情况下筛选数据（不参与打分）

注意：用的最多的就是match和range查询

建立映射mapping
PUT lagou2
{
"mappings":{
"job":{
"properties":{
"title":{
"store":true,
"type":"text",
"analyzer":"ik_max_word"
},
"company_name":{
"store":true,
"type":"keyword"
},
"desc":{
"type":"text"
},
"comments":{
"type":"integer"
},
"add_time":{
"type":"date",
"format":"yyyy-MM-dd"
}
}
}
}
}

准备练习数据插入数据
POST lagou2/job/
{
"title":"es打造搜索引擎",
"company_name":"阿里巴巴科技有限公司",
"desc":"熟悉数据结构和算法，熟悉python基本开发",
"comments":15,
"add_time":"2017-4-27"
}

POST lagou2/job/
{
"title":"python打造搜索引擎系统",
"company_name":"阿里巴巴科技有限公司",
"desc":"熟悉推荐引擎原理和算法，掌握c语言",
"comments":60,
"add_time":"2017-10-20"
}

POST lagou2/job/
{
"title":"python django 开发工程师",
"company_name":"美团科技有限公司",
"desc":"熟悉django概念，熟悉python基本知识",
"comments":15,
"add_time":"2017-4-2"
}

POST lagou2/job/
{
"title":"python scrapy redis分布式爬虫",
"company_name":"百度科技有限公司",
"desc":"熟悉scrapy的概念，熟悉redis基本操作",
"comments":5,
"add_time":"2017-4-27"
}

match 查询 title中含有python或Python，es会对“python”这个字符串进行分词，如果搜“python网站”也会搜索出来，会拆分成“python”和“网站”有一个能搜出了就会返回结果
GET lagou2/job/_search
{
"query":{
"match":{
"title":"python"
}
}
}

在match查询条件下，查company_name，这个是keyword类型，也不能做分词处理，就查不到结果
GET lagou2/job/_search
{
"query":{
"match":{
"company_name":"百度"
}
}
}

term 查询 term查询就把被查的当做一个整体，不会分词
GET lagou2/_search
{
"query":{
"term":{
"company_name":"阿里巴巴科技有限公司"
}
}
}

改成“python爬虫”就不能查出来词
GET lagou2/_search
{
"query":{
"term":{
"title":"python爬虫"
}
}
}

terms查询["工程师"，"django","系统"]有一个词满足就返回结果
GET lagou2/_search
{
"query":{
"terms":{
"title":["工程师","django","系统"]
}
}
}

控制查询的返回数量,作用是过滤返回的结果，分页使用
GET lagou2/_search
{
"query":{
"match":{
"title":"python"
}
},
"from":0,
"size":2
}

match_all的查询
GET lagou2/_search
{
"query":{
"match_all":{}
}
}

match_phrase 查询, 满足条件[“python",“系统”]要同时满足两个词，才会有结果。参数slop表示两个词之间距至少是6
GET lagou2/_search
{
"query":{
"match_phrase":{
"title":{
"query": "python系统",
"slop": 6
}
}
}
}

multi_match查询比如查询title和desc这两个字段里面包含python的关键词文档,只要title和desc字段里有python就返回结果。
GET lagou2/_search
{
"query":{
"multi_match":{
"query": "python",
"fields": ["title","desc"]
}
}
}

title^3表示权重是desc的3倍。
GET lagou2/_search
{
"query":{
"multi_match":{
"query": "python",
"fields": ["title^3","desc"]
}
}
}

只返回指定的字段"title","company_name"信息，并且字段属性数stored为true的字段。
["title","company_name","desc"]如果加上desc，desc也不会返回，因为不是store为true的字段
GET lagou2/_search
{
"stored_fields":["title","company_name"],
"query":{
"match":{
"title":"python"
}
}
}

通过sort把结果排序, 对返回结果进行排序，sort是个数组根据comments降序排序
GET lagou2/_search
{
"query":{
"match_all":{}
},
"sort":[{
"comments":{
"order":"desc"
}
}]
}

查询范围
range查询, 字段comments，权重boost
GET lagou2/_search
{
"query":{
"range":{
"comments":{
"gte":10,
"lte":20,
"boost":2.0
}
}
}
}

传一个字符串判断大小范围,时间大于"2017-4-27"小于当前时间
GET lagou2/_search
{
"query":{
"range":{
"add_time":{
"gte":"2017-4-27",
"lte":"now"
}
}
}
}

wildcard查询, 模糊查询，pyth*n里的*是通配符
GET lagou2/_search
{
"query":{
"wildcard":{
"title":{"value":"pyth*n","boost":2.0}
}
}
}

ES组合查询

bool查询，用的情况比较多
用 bool查询包括,must,should,must_not，filter来完成，格式如下
bool:{
"filter":[], # 过滤用的，不参与打分的
"must":[], # 必须满足的条件
"should":[], # 满足一个或者多个条件
"must_not":[] # 跟must相反
}

插入数据，建立测试数据：
POST lagou/testjob/_bulk
{"index":{"_id":1}}
{"salary":10,"title":"Python"}
{"index":{"_id":2}}
{"salary":20,"title":"Scrapy"}
{"index":{"_id":3}}
{"salary":30,"title":"Django"}
{"index":{"_id":4}}
{"salary":30,"title":"Elasticsearch"}

一. 简单的过滤查询
示例1.
最简单的filter查询
在sql中写法：select * from testjob where salary=20
在bool查询中对应的写法，找出工资是20的数据
GET lagou/testjob/_search
{
"query":{
"bool":{
"must":{
"match_all":{}
},
"filter":{
"term":{
"salary":20
}
}
}
}
}
这里的"must"字段可以删除，因为salary对应的类型是integer是不能被分词用的。或者把"term"改成"match"也是可以的
GET lagou/testjob/_search
{
"query":{
"bool":{
"filter":{
"term":{
"salary":20
}
}
}
}
}

示例2.
可以指定多个值
GET lagou/testjob/_search
{
"query":{
"bool":{
"must":{
"match_all":{}
},
"filter":{
"terms":{
"salary":[10,20]
}
}
}
}
}
这里的"must"字段可以删除，因为salary对应的类型是integer是不能被分词用的。
GET lagou/testjob/_search
{
"query":{
"bool":{
"filter":{
"terms":{
"salary":[10,20]
}
}
}
}
}

示例3.
# select * from testjob where title="python"
对应的bool查询写法：
GET lagou/testjob/_search
{
"query":{
"bool":{
"must":{
"match_all":{}
},
"filter":{
"term":{ # term不会做预处理的，会直接用Python搜索
"title":"Python" # 注意
}
}
}
}
}
注意：这时候是查不出来数据的，即便是上面的测试数据也是大写的“Python”，但是title这个字段是text字段，在做索引的会先做分词，再做大小写的转换，所以在入库的时候，测试数据{"salary":10,"title":"Python"}里的Python会转成小写的python再入库的。term不会做预处理的，会直接用Python搜索。
如果把term改成match就可以查到了。
GET lagou/testjob/_search
{
"query":{
"bool":{
"must":{
"match_all":{}
},
"filter":{
"match":{
"title":"Python"
}
}
}
}
}

查看分析器解析的结果，看看怎么对“Python网络开发工程师”做分词的
POST lagou/testjob/_bulk
{"index":{"_id":6}}
{"salary":10,"title":"Python网络开发工程师"}

GET /_analyze
{
"analyzer":"ik_max_word",
"text":"Python网络开发工程师"
}
这样就会看到分词器返回的分词结果了：python，网络，络，开发，发，工程师，工程，师

可以把分词器改成ik_smart,再看看分词结果
GET /_analyze
{
"analyzer":"ik_smart",
"text":"Python网络开发工程师"
}
这样就会看到分词器返回的分词结果了：python，网络，开发，工程师

二. 复杂的组合过滤查询
比如要查一条sql中的数据：查询薪资等于20或者工作是Python的工作，排除价格是30的
select * from testjob where(salary=20 OR title=Python) AND (salary != 30)
在bool查询中的写法：
GET lagou/testjob/_search
{
"query":{
"bool":{
"should":[
{"term":{"salary":20}},
{"term":{"title":"python"}} # 这个python不能大写，因为入库的时候是小写
],
"must_not":{ # salary 不等于30
"term":{
"salary":30
}
}
}
}
}

# salary 不等于30 也不等于10
GET lagou/testjob/_search
{
"query":{
"bool":{
"should":[
{"term":{"salary":20}},
{"term":{"title":"python"}}
],
"must_not":[
{"term":{"salary":30}},
{"term":{"salary":10}}
]
}
}
}

嵌套查询
更复杂的嵌套查询bool里面嵌套bool,比如SQL中的.
select * from testjob where title="python" or (title="django" AND salary=30)
GET lagou/testjob/_search
{
"query":{
"bool":{
"should":[
{"term":{"title":"python"}},
{"bool":{ # 对应SQL中or后面的嵌套
"must":[
{"term":{"title":"django"}},
{"term":{"salary":30}}
]
}
}
]
}
}
}

过滤空和非空的操作
建立测试数据
GET lagou/testjob2/_bulk
{"index":{"_id":1}}
{"tags":["search"]}
{"index":{"_id":2}}
{"tags":["search","python"]}
{"index":{"_id":3}}
{"other_field":["some data"]} # 没有传递text
{"index":{"_id":4}}
{"tags":null}
{"index":{"_id":5}}
{"tags":["search",null]}

对应在SQL中处理空值的方法
select tags from testjob2 where tags is not NULL
GET lagou/testjob2/_search
{
"query":{
"bool":{
"filter":{
"exists":{
"field":"tags" # field 是不能改的关键词
}
}
}
}
}
返回结果是id等于1,2,5的3条数据，首先要有字段tags，其次tags字段有值

查询为空的，或者不存在tags的
GET lagou/testjob2/_search
{
"query":{
"bool":{
"must_not":{
"exists":{
"field":"tags"
}
}
}
}
}

posted on 2020-04-20 15:18 KD_131 阅读(291) 评论(0) 编辑收藏举报