（11）ElasticSearch mapping解释与说明

　　在es中，执行一个PUT操作，es会自动创建索引，自动创建索引下的类型，其实es还创建了mapping。mappingd定义了type中的每一个字段的数据类型以及这些字段如何分词等相关属性。创建索引的时候，可以预先定义字段的类型以及相关属性，这样就能够把日期字段处理成日期，把数字字段处理成数字，把字符串字段处理成字符串值等。学习mapping先创建一个文档，如下：

PUT /myindex/article/1
{
  "post_date":"2018-05-10",
  "title":"Java",
  "content":"java is the best language",
  "author_id":119
}

　　查看mapping的语句：GET /myindex/article/_mapping。结果如下：

{
  "myindex": {
    "mappings": {
      "article": {
        "properties": {
          "author_id": {
            "type": "long"
          },
          "content": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "post_date": {
            "type": "date"
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

　　可以看到查询出了索引是myindex、类型是article。

　　author_id字段类型是long；content类型是text；post_date类型是date；title类型是text。es会自动识别字段类型。

　　es是支持数据类型的，它自动创建的映射是动态映射（dynamic mapping）。

　　es支持的数据类型如下：

　　（1）核心数据类型（Core datatype）

　　字符串：string，包括 text和keyword。text类型被用来索引长文本。在建立索引前会将这些文本进行分词，转化为词的组合。建立索引，允许es来检索这些词语。text类型不能用来排序和聚合。keyword类型不需要进行分词。可以被用来检索过滤、排序和聚合。keyword类型字段只能用本身来进行检索。

　　数字型：long、integer、short、byte、double、float

　　日期型：date

　　布尔型：boolean

　　二进制型：binary

　　日期、数值型不会分词，只能全部匹配查询，字符串可以分词，能模糊查询，举例如下：

　　添加如下两条数据，结合开始添加的数据，共3条数据：

PUT /myindex/article/2
{
  "post_date":"2018-05-12",
  "title":"html",
  "content":"i like html",
  "author_id":120
}

PUT /myindex/article/3
{
  "post_date":"2018-05-16",
  "title":"es",
  "content":"Es is distributed document store",
  "author_id":110
}

　　执行查询，结果：

　　GET /myindex/article/_search?q=post_date:2018　　不会查出数据

　　GET /myindex/article/_search?q=post_date:2018-05　　不会查出数据

　　GET /myindex/article/_search?q=post_date:2018-05-10　　会查出数据

　　GET /myindex/article/_search?q=html　　会查出数据

　　GET /myindex/article/_search?q=java　　会查出数据

　　（2）复杂数据类型（Complex datatypes）

　　数组类型（Array datatype）:数组类型不需要专门指定数组元素的type，例如：

　　字符型数组：["one","two"]

　　整型数组：[1,2]

　　数组型数组：[1,[2,3]]等价于[1,2,3]

　　对象数组：[{"name":"Mary","age":12},{"name":"John","age":10}]

　　对象类型（Object datatype）：_object_用于单个json对象

　　嵌套类型（Nested datatype）: _nested_用于json数组

　　举例说明：

PUT /lib/person/1
{
    "name":"Tom",
    "age":25,
    "birthday":"1985-12-12",
    "address":{
        "country":"china",
        "province":"guangdong",
        "city":"shenzhen"
    }
}

　　底层存储格式为：

{
    "name":["Tom"],
    "age":[25],
    "birthday":["1985-12-12"],
    "address.country":["china"],
    "address.province":["guangdong"],
    "address.city":["shenzhen"]
}

PUT /lib/person/2
{
    "persons":[
        {"name":"lisi","age":27},
        {"name":"wangwu","age":26},
        {"name":"zhangsan","age":23}
    ]
}

　　底层存储格式为：

{
    "persons.name":["lisi","wangwu","zhangsan"],
    "persons.age":[27,26,23]
}

　　（3）地理位置类型（Geo datatypes）

　　地理坐标类型（Geo-point datatype）: _geo_point_用于经纬度坐标

　　地理形状类型（Geo-Shape datatype）：_geo_shape_用于类似于多边形的复杂形状

　　（4）特定类型（Specialised datatypes）
　　IPv4类型（IPv4 datatype）：_ip_用于IPv4地址

　　Completion类型（Completion datatype）: _completion_提供自动补全建议

　　Token count类型（Token count datatype）: _token_count_ 用于统计做了标记的字段的index数目，该值会一直增加，不会因为过滤条件而减少。

　　mapper-murmur3类型：通过插件，可以通过 _murmur3来计算index的hash值：

　　附加类型（Attachment datatype）:采用mapper-attachments插件，可支持_attachements_ 索引，如 Microsoft Office格式，Open Document格式，ePub,HTML等。

　　字段支持的属性：

　　"store": 字段上的值是不是被存储，如果没有存储就只能搜索，不能获取值，默认false，不存储

　　"index": true//分词,false//不分词，字段将不会被索引

　　"analyzer": "ik"//指定分词器，默认分词器为standard analyzer

　　"boost": 1.23//字段级别的分数加权，默认值是1.0

　　"ignore_above": 100//超过100个字符的文本，将会被忽略，不被索引

　　"search_analyzer": "ik"//设置搜索时的分词器，默认跟ananlyzer是一致的，比如index时用standard+ngram,搜索时用standard来完成自动提示功能。

　　手动创建mapping

put /lib
{
    "settings":{
        "number_of_shards":3,
        "number_of_replicas":0
    },
    "mappings":{
        "books":{
            "properties":{
                "title":{"type":"text"},
                "name":{"type":"text","analyzer":"standard"},
                "publish_date":{"type":"date","index":false},
                "price":{"type":"double"},
                "number":{"type":"integer"}
            }
        }
    }
}

　　指定了类型是books，字段name的分词器是analyzer，publish_date不使用分词索引。假如添加了一个新字段，新字段会按照默认的属性创建，如下：

PUT /lib/books/1
{
  "title":"java is good",
  "name":"java",
  "publish_date":"2019-01-12",
  "price":23,
  "number":46,
  "mark":"no"
}

　　查看一下mapping情况：

　　GET lib/books/_mapping

{
  "lib": {
    "mappings": {
      "books": {
        "properties": {
          "mark": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "analyzer": "standard"
          },
          "number": {
            "type": "integer"
          },
          "price": {
            "type": "double"
          },
          "publish_date": {
            "type": "date",
            "index": false
          },
          "title": {
            "type": "text"
          }
        }
      }
    }
  }
}

posted @ 2019-08-31 16:25 雷雨客阅读(2011) 评论(0) 收藏举报

刷新页面返回顶部