Elasticsearch 基本操作
1、创建索引
1.1、使用缺省配置创建索引(5个分片,1个副本)
PUT test
索引名称test必须小写
1.2、指定分片和副本:
PUT mytest { "settings": { "number_of_shards": 3, "number_of_replicas": 1 } }
2、查看索引
2.1、查看基本信息:
GET mytest
只返回配置信息:
GET mytest/_settings
2.2、查看多个索引:
GET bus,home,blog,mytest/_settings
GET bus,home,blog,mytest
3、删除索引
DELETE mytest
4、关闭和打开索引
关闭: POST mytest/_close
打开:
POST mytest/_open
关闭索引后不能更新索引和查询索引内容,否则会抛出错误
{ "error": { "root_cause": [ { "type": "index_closed_exception", "reason": "closed", "index_uuid": "9LpmSP7mR3KlXXZ1oD-YFw", "index": "mytest" } ], "type": "index_closed_exception", "reason": "closed", "index_uuid": "9LpmSP7mR3KlXXZ1oD-YFw", "index": "mytest" }, "status": 400 }
5、查看集群索引和健康度
5.1、查看某几个的状态:
查看索引bus,home,blog,mytest四个的状态 GET /_cat/indices/bus,home,blog,mytest?v
查看bus开头的索引
GET /_cat/indices/bus*?v
5.2、查看所有索引:
GET _cat/indices?v
5.3、查看集群健康度:
GET /_cat/health?v
6、文档基本操作
文档格式:
index/type/id
6.1、添加文档:
PUT /bus/product/1 { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 2, "producer" : "东部公交", "tags": [ "空调", "普通","单层"] }
或者:
POST /bus/product/5 { "name" : "机场大巴A2线", "desc" : "机场到B酒店来回", "price" : 25, "producer" : "机场大巴", "tags": [ "单层", "空调","大巴"] }
假设索引id不存在就创建数据(put-if-absent),如果id存在则创建失败
PUT twitter/_doc/1?op_type=create
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
PUT twitter/_doc/1/_create
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
创建失败:
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[product][1]: version conflict, document already exists (current version [7])",
"index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
"shard": "2",
"index": "bus"
}
],
"type": "version_conflict_engine_exception",
"reason": "[product][1]: version conflict, document already exists (current version [7])",
"index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
"shard": "2",
"index": "bus"
},
"status": 409
}
设置写入数据的超时时间,缺省是1分钟
超时时间为5分钟
PUT twitter/_doc/1?timeout=5m
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
6.2、获取文档:
GET bus/product/1 返回: { "_index" : "bus", "_type" : "product", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 2, "producer" : "东部公交", "tags" : [ "空调", "普通", "单层" ] } }
指定source返回内容:
GET bus/product/122?_source=name,price
{
"_index" : "bus",
"_type" : "product",
"_id" : "122",
"_version" : 1,
"found" : true,
"_source" : {
"price" : 5,
"name" : "公交车1路"
}
}
不返回source
GET bus/product/122?_source=false
只返回source
GET bus/product/122/_source
判断文档是否存在
HEAD bus/product/1
关闭_source字段内容或指定内容
GET twitter/_doc/0?_source=false
GET twitter/_doc/0?_source_include=*.id&_source_exclude=entities
GET twitter/_doc/0?_source=*.id,retweeted
取回的数据,取决于stored_fields参数
建立索引,counter不储存数据
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"counter": {
"type": "integer",
"store": false
},
"tags": {
"type": "keyword",
"store": true
}
}
}
}
}
添加数据
PUT twitter/_doc/1
{
"counter" : 1,
"tags" : ["red"]
}
取回tags和counter数据
GET twitter/_doc/1?stored_fields=tags,counter
返回结果里只有tags有数据
{
"_index": "twitter",
"_type": "_doc",
"_id": "1",
"_version": 1,
"found": true,
"fields": {
"tags": [
"red"
]
}
}
6.3、获取多个文档:
返回id为1和2的文档 GET bus/product/_mget { "ids":[1,2] }
查询的document是不同index:
GET /_mget { "docs":[ { "_index":"bus", "_type":"product", "_id":1 }, { "_index":"mytest", "_type":"product", "_id":1 } ] }
6.4、替换文档:全部更新
PUT /bus/product/1 { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 5, "producer" : "东部公交", "tags": [ "空调", "普通","单层"] } GET /bus/product/1 返回: { "_index" : "bus", "_type" : "product", "_id" : "1", "_version" : 2, "found" : true, "_source" : { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 5, "producer" : "东部公交", "tags" : [ "空调", "普通", "单层" ] } }
或者用POST
根据版本进行更新,如果版本号变化则更新失败。
PUT bus/product/1?version=5
{
"name":"公交车5路(version5)"
}
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[product][1]: version conflict, current version [7] is different than the one provided [5]",
"index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
"shard": "2",
"index": "bus"
}
],
"type": "version_conflict_engine_exception",
"reason": "[product][1]: version conflict, current version [7] is different than the one provided [5]",
"index_uuid": "G4DrNdPhRWK_rBuEaluwsA",
"shard": "2",
"index": "bus"
},
"status": 409
}
6.5、更新文档:部分更新
POST /bus/product/1/_update { "doc": { "price": 10 } }
GET /bus/product/1 返回: { "_index" : "bus", "_type" : "product", "_id" : "1", "_version" : 4, "found" : true, "_source" : { "name" : "公交车1路", "desc" : "从东站到西站", "price" : 10, "producer" : "东部公交", "tags" : [ "空调", "普通", "单层" ] } }
6.6、删除文档:
DELETE /bus/product/1
然后再查询:
GET /bus/product/1 { "_index" : "bus", "_type" : "product", "_id" : "1", "found" : false }
在删除文档时,可以指定版本,以确保我们试图删除的相关文档实际上正在被删除,同时它没有改变。对文档执行的每个写操作(包括删除)都会导致其版本增加。
DELETE bus/product/100?version=6
根据检索条件删除,慎用,非常容易误删除
POST twitter/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}
7、检索文档
7.1 检索所有文档
GET bus/product/_search
7.2 term检索
term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个,如果没有安装分词插件,汉字分词按每个汉字来分。
查询不到内容: GET bus/product/_search { "query": { "term": { "producer": "公交" } } }
producer中所有带“公”的文档都会被查询出来 GET bus/product/_search { "query": { "term": { "producer": "公" } } }
7.3 match检索
match查询会先对搜索词进行分词,分词完毕后再逐个对分词结果进行匹配,因此相比于term的精确搜索,match是分词匹配搜索
描述中带有机场酒店四个字的各种组合的文档都会被返回 GET bus/product/_search { "query": { "match": { "desc": "机场酒店" } } }
7.4 分页
GET bus/_search { "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } }
GET bus/_search
{
"from": 0,
"size": 5,
"query": {
"match_all": {}
}
}
7.5 过滤字段,类似select a,b from table中a,b
GET bus/_search { "_source": ["name","desc"] , "query": { "match": { "desc": "机场" } } }
result:
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 12,
"max_score" : 2.1208954,
"hits" : [
{
"_index" : "bus",
"_type" : "product",
"_id" : "9",
"_score" : 2.1208954,
"_source" : {
"name" : "机场大巴A2线",
"desc" : "机机场场"
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "10",
"_score" : 2.1208954,
"_source" : {
"name" : "机场大巴A2线",
"desc" : "机机场场"
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "6",
"_score" : 0.62362677,
"_source" : {
"name" : "机场大巴A2线",
"desc" : "机机场场"
}
}
]
}
}
7.6 显示版本
GET bus/_search { "version": true, "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } }
7.7 评分
GET bus/_search { "version": true, "min_score":"2.3", #大于2.3 "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } }
7.8 高亮关键字
GET bus/_search { "version": true, "from": 0, "size": 3, "query": { "match": { "desc": "机场酒店" } } , "highlight": { "fields": { "desc": {} } } }
7.9 短语匹配match_phrase
与match query类似,但用于匹配精确短语,分词后所有词项都要出现在该字段中,字段中的词项顺序要一致。
GET bus/_search { "query": { "match_phrase": { "name": "公交车122" } } }
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.4102418,
"hits" : [
{
"_index" : "bus",
"_type" : "product",
"_id" : "3",
"_score" : 3.4102418,
"_source" : {
"name" : "公交车122路",
"desc" : "从前兴路枢纽到东站",
"price" : 2,
"producer" : "公交集团",
"tags" : [
"单层",
"空调"
]
}
}
]
}
}
对比match
GET bus/_search
{
"query": {
"match": {
"name": "公交车122"
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 5.3417225,
"hits" : [
{
"_index" : "bus",
"_type" : "product",
"_id" : "2",
"_score" : 5.3417225,
"_source" : {
"name" : "公交车5路",
"desc" : "从巫家坝到梁家河",
"price" : 1,
"producer" : "公交集团",
"tags" : [
"双层",
"普通",
"热门"
]
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "3",
"_score" : 3.4102418,
"_source" : {
"name" : "公交车122路",
"desc" : "从前兴路枢纽到东站",
"price" : 2,
"producer" : "公交集团",
"tags" : [
"单层",
"空调"
]
}
},
{
"_index" : "bus",
"_type" : "product",
"_id" : "1",
"_score" : 2.1597636,
"_source" : {
"name" : "公交车5路",
"desc" : "从巫家坝到梁家河",
"price" : 1,
"producer" : "公交集团",
"tags" : [
"双层",
"普通",
"热门"
]
}
}
]
}
}
7.10 前缀查询match_phrase_prefix
match_phrase_prefix与match_phrase相同,只是它允许在文本中的最后一个词的前缀匹配
GET bus/_search { "query": { "match_phrase_prefix": { "name": "公交车1" } } } { "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 6.8204837, "hits" : [ { "_index" : "bus", "_type" : "product", "_id" : "3", "_score" : 6.8204837, "_source" : { "name" : "公交车122路", "desc" : "从前兴路枢纽到东站", "price" : 2, "producer" : "公交集团", "tags" : [ "单层", "空调" ] } } ] } } 对比: GET bus/_search { "query": { "match_phrase": { "name": "公交车1" } } } { "took" : 0, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } }
7.11 多字段查询multi_match
GET bus/_search { "query": { "multi_match": { "query": "空港", "fields": ["desc","name"] } } } { "took" : 1, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 3.6836727, "hits" : [ { "_index" : "bus", "_type" : "product", "_id" : "16", "_score" : 3.6836727, "_source" : { "name" : "机场大巴A2线", "desc" : "空港", "price" : 21, "producer" : "大巴", "tags" : [ "单层", "空调", "大巴" ] } }, { "_index" : "bus", "_type" : "product", "_id" : "18", "_score" : 3.5525968, "_source" : { "name" : "空港大巴A2线", "desc" : "机场", "price" : 21, "producer" : "大巴", "tags" : [ "单层", "空调", "大巴" ] } }, { "_index" : "bus", "_type" : "product", "_id" : "19", "_score" : 3.1757839, "_source" : { "name" : "空港大巴A2线", "desc" : "空港快线", "price" : 21, "producer" : "大巴", "tags" : [ "单层", "空调", "大巴" ] } } ] } }
8、路由routing
路由机制与其分片机制有着直接的关系。Elasticsearch的路由机制即是通过哈希算法,将具有相同哈希值的文档放置到同一个主分片中。这个和通过哈希算法来进行负载均衡几乎是一样的。
而Elasticsearch也有一个默认的路由算法:它会将文档的ID值作为依据将其哈希到相应的主分片上,这种算法基本上会保持所有数据在所有分片上的一个平均分布,而不会产生数据热点。
可以自定义路由,将数据集中保存,但控制不好会造成某分片压力过大。
PUT mytest/product/4?routing=weapon { "name" : "手枪", "desc" : "增加100点攻击", "price" : 15400, "producer" : "神秘商店", "tags": [ "机械", "穿透" ] } GET mytest/product/4 GET mytest/product/4?routing=weapon
检索中使用routing
GET mytest/_search { "query": { "match": { "_routing": "weapon" } } } GET mytest/_search { "query": { "term": { "_routing": "weapon" } } }
9、mapping
mapping相当于数据表的表结构,建立索引的时候如果不指定mapping,在创建数据的时候,es会自动推断数据类型,属于动态创建mapping结构,也可以手动(静态)创建。
PUT bus { "mappings": { "product":{ "properties": { "name":{"type":"text"}, "desc":{"type":"text"}, "price":{"type":"long"}, "producer":{"type":"text"}, "tags":{"type":"text"} } } } , "settings": { "number_of_replicas": 1 , "number_of_shards": 3 } }
格式化日期字段:
PUT bus4
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
, "mappings": {
"product":{
"properties":{
"name":{"type":"text"},
"updateDate":{
"type":"date",
"format":"yyyy-MM-dd"
}
}
}
}
}
通常,mapping中已经存在的字段不能updated,但是有几种情况是可以例外的:
- Object的数据类型可以新增属性。
- 新的字段可以增加。
- ignore_above可以更新
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"name": {
"properties": {
"first": {
"type": "text"
}
}
},
"user_id": {
"type": "keyword"
}
}
}
}
}
PUT my_index/_mapping/_doc
{
"properties": {
"name": {
"properties": {
"last": {
"type": "text"
}
}
},
"user_id": {
"type": "keyword",
"ignore_above": 100
}
}
}
创建一个新索引,第一个字段name是Object datatype,其下有属性first; 新增一个last字段在name字段下; 将缺省的ignore_above字段设置为100。
在建立静态mapping后,还可以动态再加入类型
直接更新提交一个没有的字段,这个时候memo就是推断类型 POST /bus/product/1/_update { "doc": { "memo": "a test" } } 用GET bus/_mapping查看 { "bus" : { "mappings" : { "product" : { "properties" : { "desc" : { "type" : "text" }, "memo" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "name" : { "type" : "text" }, "price" : { "type" : "long" }, "producer" : { "type" : "text" }, "tags" : { "type" : "text" } } } } } }
10、批量操作
批量操作_bulk,在bulk操作中任意一个操作失败,是不会影响其他的操作的,但是在返回结果里,会告诉你异常日志,
bulk api对json的语法,有严格的要求,每个json串不能换行,只能放一行,同时一个json串和一个json串之间,必须有一个换行
POST /_bulk
{ "delete": { "_index": "home", "_type": "product", "_id": "1" }}
{ "create": { "_index": "home", "_type": "product", "_id": "1" }}
{ "title": "My first post2","memo":"a test2","date":"2018-12-12" }
{ "update": { "_index": "home", "_type": "product", "_id": "2"} }
{ "doc" : {"title" : "My updated post2"} }
{ "delete": { "_index": "home", "_type": "product", "_id": "3" }}
{ "create": { "_index": "home", "_type": "product", "_id": "3" }}
{ "title": "My first post3","memo":"a test23","date":"2018-12-13" }
POST /_bulk
{ "index":{ "_index": "home", "_type": "product" ,"_id":1}}
{ "title":"My post1" ,"memo":"a test1","date":"2018-12-01"}
{ "index":{ "_index": "home", "_type": "product" ,"_id":2}}
{ "title":"My post2" ,"memo":"a test2","date":"2018-12-02"}
{ "index":{ "_index": "home", "_type": "product" ,"_id":3}}
{ "title":"My post3" ,"memo":"a test3","date":"2018-12-03"}
以及:POST /home/product/_bulk 或POST /home/_bulk
11、重建索引reindex
11.1 Reindex不尝试设置目标索引。它不复制源索引的设置。应该在运行_reindex操作之前设置目标索引,包括设置mappings、shard、replica等。
PUT bus_bak
{
"settings": {
"number_of_shards": 1
, "number_of_replicas": 0
}
}
POST _reindex { "source": { "index": "bus" } , "dest": { "index": "bus_bak" } }
11.2 版本设置, 重建后,目标索引的版本缺省是重新计数的,如果需要与源目标相同需要指定版本类型为external.
POST _reindex
{
"source": {
"index": "bus"
}
, "dest": {
"index": "bus_bak",
"version_type": "external"
}
}
11.3 只重建目标索引中没有的文档,如果有id相同的文档将发生冲突错误
POST _reindex
{
"source": {
"index": "bus"
}
, "dest": {
"index": "bus_bak",
"op_type": "create"
}
}
默认情况下,版本冲突会中止_reindex进程,但是可以通过设置"conflicts": "proceed"来计数冲突,而不中断执行
POST _reindex
{
"conflicts": "proceed",
"source": {
"index": "bus"
}
, "dest": {
"index": "bus_bak",
"op_type": "create"
}
}
11.4 根据检索结果重建索引
POST _reindex
{
"source": {
"index": "bus",
"type": "product",
"query": {
"match": {
"name": "公交"
}
}
}
, "dest": {
"index": "bus_bak"
}
}
{
"took" : 26,
"timed_out" : false,
"total" : 5,
"updated" : 0,
"created" : 5,
"deleted" : 0,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
直接限制或选择source内容重建
POST _reindex
{
"source": {
"index": "twitter",
"_source": ["user", "_doc"]
},
"dest": {
"index": "new_twitter"
}
}
11.5 把多个索引一起重建到某个索引里
POST _reindex
{
"source": {
"index": ["bus","user"],
"type": ["product","info"]
}
, "dest": {
"index": "blog",
"type":"_doc"
}
}
11.6 限制重新索引的数量
POST _reindex
{
"size": 1,
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
POST _reindex
{
"size": 10000,
"source": {
"index": "twitter",
"sort": { "date": "desc" }
},
"dest": {
"index": "new_twitter"
}
}

浙公网安备 33010602011771号