Elasticsearch Mapping

Dynamic mapping 动态mapping

Dynamic mapping

Elasticsearch支持自动index和字段

PUT data/_doc/1   
{ "count": 5 }

将自动创建data索引（若不存在），自动创建count字段映射

当Elasticsearch在文档中检测到一个新字段时，默认情况下会将该字段动态添加到映射中。

通过将动态参数设置为true或runtime，可以明确指示Elasticsearch根据传入文档动态创建字段。

下表中的字段数据类型是Elasticsearch动态映射支持的数据类型。其他类型的字段必须显式映射。

JSON data type	`"dynamic":"true"`	`"dynamic":"runtime"`
`null`	No field added	No field added
`true` or `false`	`boolean`	`boolean`
`double`	`float`	`double`
`integer`	`long`	`long`
`object`	`object`	No field added
`array`	Depends on the first non-`null` value in the array 根据数组中第一个非null的类型	Depends on the first non-`null` value in the array 根据数组中第一个非null的类型
`string` that passes date detection 日期格式	`date`	`date`
`string` that passes numeric detection 数字格式	`float` or `long`	`double` or `long`
`string` that doesn’t pass `date` detection or `numeric` detection 非日期非数字	`text` with a `.keyword` sub-field	`keyword`

dynamic支持的参数有：

`true`	默认，新字段添加到mapping
`runtime`	新字段添加到mapping，但不会indexed，查询的时候会在_source中
`false`	忽略新字段，不会被indexed，也不能被搜索，但会在_source中
`strict`	若有新字段，文档将被拒绝

可以在mapping级别和object级别设置dynamic，例如可以使用更新mapping的API设置dynamic

PUT /user/_mapping

{

"dynamic": false,

"properties": {

"name": {

"type": "keyword"

"groups": {

"dynamic": false,

"properties": {

"name": {

"type": "keyword"

}

日期检测

若 date_detection 可用（默认true），当有string类型的新字段时，ES将尝试各种内置的 dynamic_date_formats 日期格式进行匹配，若匹配中其中一个，则将该字段设置为对应日期pattern的date 类型。

默认的 dynamic_date_formats 格式是 [ "strict_date_optional_time","yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]

可以设置date_detection 为false：

PUT my-index-000001
{
  "mappings": {
    "date_detection": false
  }
}

也可以自定义date_detection的格式：

PUT my-index-000001
{
  "mappings": {
    "dynamic_date_formats": ["MM/dd/yyyy"]
  }
}

数字检测

虽然JSON支持原生浮点和整型，但有时候数字会以字符串呈现，为了让ES不出意外的存储该字段的mapping，默认numeric_detection是false，因此数字类型的场景最好是指定mapping。

若要使用则需要设置为true：

PUT my-index-000001
{
  "mappings": {
    "numeric_detection": true
  }
}

PUT my-index-000001/_doc/1
{
  "my_float":   "1.0", 
  "my_integer": "1" 
}

Dynamic templates 动态映射模板

Dynamic templates

Dynamic templates 允许设置动态映射的规则，它的格式如下

  "dynamic_templates": [
    {
      "my_template_name": { 
        ... match conditions ... 
        "mapping": { ... } 
      }
    },
    ...
  ]

match conditions 支持：
match_mapping_type match match_pattern unmatch path_match path_unmatch

例如

PUT my-index-000001
{
  "mappings": {
    "dynamic_templates": [
      {
        "longs_as_strings": {
          "match_mapping_type": "string",
          "match":   "long_*",
          "unmatch": "*_text",
          "mapping": {
            "type": "long"
          }
        }
      }
    ]
  }
}

PUT my-index-000001/_doc/1
{
  "long_num": "5", 
  "long_text": "foo" 
}

long_num 将被映射为long，long_text 将被默认映射为string。

match_mapping_type 见dynamic的true和runtime的表格。

match unmatch 表示字段名匹配和不匹配，match_pattern 可以和他们联合使用

  "match_pattern": "regex",
  "match": "^profit_\d+$"

path_match path_unmatch 可以支持.作为路径的匹配，例如

PUT my-index-000001
{
  "mappings": {
    "dynamic_templates": [
      {
        "full_name": {
          "path_match":   "name.*",
          "path_unmatch": "*.middle",
          "mapping": {
            "type":       "text",
            "copy_to":    "full_name"
          }
        }
      }
    ]
  }
}

PUT my-index-000001/_doc/1
{
  "name": {
    "first":  "John",
    "middle": "Winston",
    "last":   "Lennon"
  }
}

Template variables

可以使用 {name} 和{dynamic_type}占位符，{name}将使用与字段名一致的 analyzer ，{dynamic_type} 将使用对应的动态类型。

使用案例参考：https://www.elastic.co/guide/en/elasticsearch/reference/7.15/dynamic-templates.html#template-examples

Explicit mapping 指定mapping

Explicit mapping

在创建index时指定mapping

PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

给已存在的mapping增加字段mapping

PUT /my-index-000001/_mapping
{
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false
    }
  }
}

注意以上是增加字段mapping而不是更新整个index的mapping到该配置

Runtime fields 运行时字段

Runtime fields

所谓运行时字段即在运行时根据设定的规则动态映射的字段，而dynamic和指定映射是固定的映射关系。

运行时字段并不会被索引，因此索引的体积不会增加，但在search时它会参与，但比索引的字段性能上通常要慢一点，因为需要转换脚本。

映射运行时字段

Map a runtime field

举个例子如何映射一个运行时字段：

PUT my-index-000001/
{
  "mappings": {
    "runtime": {
      "day_of_week": {
        "type": "keyword",
        "script": {
          "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
        }
      }
    },
    "properties": {
      "@timestamp": {"type": "date"}
    }
  }
}

以上是利用@timestamp字段的值，映射一个运行时字段day_of_week

其中runtime支持以下类型：

boolean
composite
date
double
geo_point
ip
keyword
long

如上面mapping的dynamic支持的类型章节说到，dynamic可以支持runtime，那当runtime时会怎么样呢？

例如下面这样定义一个mapping，新的字段将自动使用runtime的方式：

PUT my-index-000001
{
  "mappings": {
    "dynamic": "runtime",
    "properties": {
      "@timestamp": {
        "type": "date"
      }
    }
  }
}

也可以不使用脚本定义runtime字段：

PUT my-index-000001/
{
  "mappings": {
    "runtime": {
      "day_of_week": {
        "type": "keyword"
      }
    }
  }
}

以上所有例子中使用runtime映射的字段，在使用GET {index}/_mapping 查询时可以看到那些字段都属于runtime类型。

运行时字段还允许修改类型或删除字段：

PUT my-index-000001/_mapping
{
 "runtime": {
   "day_of_week": {
　　　　"type":"long"
　　}
 }
}

PUT my-index-000001/_mapping
{
 "runtime": {
   "day_of_week": null
 }
}

这跟以往的字段不能修改类型是不一样的，但修改类型时最好考虑字段的兼容性，以免发生不可预料的错误。

在搜索时定义运行时字段

Define runtime fields in a search request

不只可以在新增文档时映射，也可以在搜索时映射规则：

GET my-index-000001/_search
{
  "runtime_mappings": {
    "day_of_week": {
      "type": "keyword",
      "script": {
        "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
      }
    }
  },
  "aggs": {
    "day_of_week": {
      "terms": {
        "field": "day_of_week"
      }
    }
  }
}

查询时覆盖字段值

Override field values at query time

支持将返回的字段值根据规则进行重写

检索运行时字段

Retrieve a runtime field

把运行时字段进行索引

Index a runtime field

允许将运行时字段转成索引mapping字段，只需位移定义的字段范围

使用运行时字段浏览数据

Explore your data with runtime fields

例如有这样的需求：从很大的日志字段中提取你想要的关键信息，以独立字段命名并查询等。

如果为这样的数据进行索引，会占用大量磁盘，而使用该方式则可以达到高效目的。

Metadata fields 元数据字段

每个文档都包含元数据，例如_index,_type,_id

创建mapping时，可以自定义其中一些元数据字段的行为。

_id

GET my-index-000001/_search
{
  "query": {
    "terms": {
      "_id": [ "1", "2" ] 
    }
  }
}
对_id使用terms

_source

PUT my-index-000001
{
  "mappings": {
    "_source": {
      "enabled": false
    }
  }
}
关闭_source节省存储

PUT logs
{
  "mappings": {
    "_source": {
      "includes": [
        "*.count",
        "meta.*"
      ],
      "excludes": [
        "meta.description",
        "meta.other.*"
      ]
    }
  }
}
配置includes、excludes

Mapping limit settings 一些限制配置

index.mapping.total_fields.limit
　　最多字段数量，默认1000
index.mapping.depth.limit
　　字段的深度，默认20，用于限制内部object嵌套数量限制

index.mapping.nested_fields.limit
　　nested类型的字段数量，默认50，
　　由于每个嵌套对象都作为单独的Lucene文档编制索引。如果我们为包含100个用户对象的单个文档编制索引，那么将创建101个Lucene文档：一个用于父文档，每一个用于每个嵌套对象。可见这样的消耗，因此ES对此有一些配置以防止性能问题

index.mapping.nested_objects.limit
　　1个文档中nested对象的最多数量，默认10000
index.mapping.field_name_length.limit
　　字段名称的长度限制，默认不限制，通常不需要设置

posted on 2021-10-22 14:51 icodegarden 阅读(128) 评论(0) 编辑收藏举报