Elasticsearch 路由

1.默认路由规则

默认情况下通过_routing字段进行路由，这个字段的值默认等于文档_id字段

shard_num = hash(_routing) % num_primary_shards

2.自定义路由字段

PUT my_index/_doc/1?routing=user1&refresh=true 
{
  "title": "This is a document"
}

GET my_index/_doc/1?routing=user1

3._routing字段可用于查询

GET my_index/_search
{
  "query": {
    "terms": {
      "_routing": [ "user1" ] 
    }
  }
}

查询时指定路由

GET my_index/_search?routing=user1,user2 
{
  "query": {
    "match": {
      "title": "document"
    }
  }
}

4.强制CRUD操作携带routing参数

自定义路由后，CRUD操作最好都要带上routing参数，要不可能导致一个文档被保存到多个分片上，可以通过设置强制所有CRUD操作必须带routing参数，一旦设置后，不带routing的操作将会报错（throws a routing_missing_exception.）。设置如下：

PUT my_index2
{
  "mappings": {
    "_doc": {
      "_routing": {
        "required": true 
      }
    }
  }
}

PUT my_index2/_doc/1 
{
  "text": "No routing value provided"
}

5.设置了自定义路由，索引中_id字段的唯一性将得不到保障，即在不同的分片上可能会存在_id相同的文档，所以，自定义路由最好通过自定义_id的方式，保证_id的唯一性。

6.优化路由的单点查询问题

自定义路由可能会导致索引分配不均，大量的索引路由到一个分片上，导致这个分片的索引和查询性能降低。为了解决这个问题，可以设置 routing_partition_size 参数。（注意这是一个索引级别的设置，只能在创建索引的时候设置。）这样routing将路由到一组分片，然后_id字段在决定文档保存到那一个分片上。由于这个原因。routing_partition_size的值必须是一个大于1但是小于number_of_shards设置的分片数量的一个整数。具体公式如下：

shard_num = (hash(_routing) + hash(_id) % routing_partition_size) % num_primary_shards

注意：索引一旦设置了routing_partition_size 后，join field将不能被创建了，同时索引中的所有mapping必须设置_routing为required的。

查看每个分片文档数量

GET _cat/shards/test07

test07 2 p STARTED 0  261b 192.168.0.16 testshshs_core
test07 1 p STARTED 1 3.2kb 192.168.0.15 testshshs_first
test07 0 p STARTED 1 3.2kb 192.168.0.16 testshshs_core

注意：网上说自定义路由选取的字段特别重要，需要考虑是否达到预期效果，不要是数据过于集中，可能会适得其反，这需要根据场景实际测试。

具体怎么选，我也没有太多经验，各位有好的想法，欢迎指点一下，非常感谢。

参考文档：

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html

https://blog.csdn.net/weizg/article/details/79162221

posted @ 2019-04-04 10:48 粒子先生阅读(1798) 评论(0) 收藏举报

刷新页面返回顶部

AI晓

Elasticsearch 路由

公告