限制索引、字段的自动创建无模式与自定义索引结构映射切面搜索

Elasticsearch 默认会自动创建索引

当你有一个大系统，需要很多流程来将数据输送到Elasticsearch时候，索引名称的一个简单拼写错误可能会破坏掉几个小时的脚本的工作、几百G的数据。

修改elasticsearch.yml配置

action.auto_create_index :

true 新建文档时候，如果索引不存在，允许自动创建索引
false 不允许
-an*,+a,-* 允许自动创建以a开头的除去an开头的索引，禁止自动创建其他名称的所有（注意模式顺序）

设置分片和副本的数量

curl -XPUT http://localhost:9200/blog/ -d '{

　　"settings":{

　　　　　　"number_of_shards":1,

　　　　　　"number_of_replicas":2

　　}

得到了3个Lucene索引

删除没有映射配置的索引

curl -XDELETE http://localhost:9200/blog

【模式映射 schema mapping】

sql数据库：在存入数据前需要创建模式以描述数据

【实践所得】

对于字段结构无法确定的场景，比如第三方api，例如亚马逊订单相关的api，的返回数据字段名、个数都不确定：无模式引擎，例如mongo、rabbitMq、es

限制字段的自动创建

curl -XPUT http://localhost:9200/blog/ -d @j.json

{
  "mappings": {
    "articles": {
      "dynamic": "false",
      "properties": {
        "id": {
          "type": "string"
        },
        "content": {
          "type": "string"
        },
        "author": {
          "type": "string"
        }
      }
    }
  }
}

 注意：对存储映射的json文件名无限制

在blog索引的article类型中，在properties部分未提到的字段会被es忽略


字段类型的确定机制：

{"filed1":10,"filed2":"10"} 1被确定为number，long类型，2被确定为string

{
  "mappings": {
    "articles": {
      "numeric_detrection": true
    }
  }
}

避免数据类型被es文本检测为字符串

创建一个保存博客帖子数据的posts索引，结构：

唯一标识符、名称、发布日期、内容。

映射文件：

{
  "mappings": {
    "posts": {
      "dynamic": "false",
      "properties": {
        "id": {
          "type": "long",
          "store": "yes",
          "precision_step": "0"
        },
        "name": {
          "type": "long",
          "store": "yes",
          "index": "analyzed"
        },
        "published": {
          "type": "date",
          "store": "yes",
          "precision_step": "0"
        },
        "contents": {
          "type": "string",
          "store": "no",
          "index": "analyzed"
        }
      }
    }
  }
}


字段类型
string number data boolean binary二进制
不同类型字段的公共属性
index_name 该属性定义将存储在索引中的字段名称；如未定义，字段将以对象的名字来命名
index 
　　analyzed 默认值，该字段将被编入索引以供搜索；
　　no 无法搜索该字段
 　 not_analyzed  字段将不经分析而编入索引，使用原始值被编入索引，在搜索过程中必须全部匹配
store 该字段的原始值是否被写入索引
　　no 默认值，在结果中不能返回该字段；如果使用_source字段，即使没有存储页可以返回这个值
　　yes 在字段被写入索引，可以基于它来搜索数据



定义了分析器的映射文件

{
  "setting:": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "posts": {
      "dynamic": "false",
      "properties": {
        "id": {
          "type": "long",
          "store": "yes",
          "precision_step": "0"
        },
        "name": {
          "type": "long",
          "store": "yes",
          "index": "analyzed"
        },
        "published": {
          "type": "date",
          "store": "yes",
          "precision_step": "0"
        },
        "contents": {
          "type": "string",
          "store": "no",
          "index": "analyzed"
        }
      }
    }
  }
}



索引的分析器字段_analyzer
场景：1、软件检测写入文档的语言，并在文档的language字段存储相关信息；2、然后，使用这种信息选择合适的分析器
先在setting中创建针对不同language的分析器，然后，写入映射文件

{
  "mappings": {
    "posts": {
      "_analyzer": {
        "path": "language"
      },
      "properties": {
        "id": {
          "type": "long",
          "store": "yes",
          "precision_step": "0"
        },
        "name": {
          "type": "long",
          "store": "yes",
          "index": "analyzed"
        },
        "language": {
          "type": "date",
          "store": "yes",
          "precision_step": "0",
          "index": "not_analyzed"
        }
      }
    }
  }
}

该例子中，应该定义一个与language字段中提供的值一样的分析器，否则索引会创建失败。


文档值 doc_value_format 
在博客索引例子中，新增加字段表示文章的点赞数量，且希望对其排序：因为需要排序，所以此处的点赞数很适合使用文档值

{
  "mappings": {
    "posts": {
      "properties": {
        "id": {
          "type": "long",
          "store": "yes",
          "precision_step": "0"
        },
        "contents": {
          "type": "string",
          "store": "no",
          "index": "analyzed"
        },
        "votes": {
          "type": "integer",
          "doc_values_foramt": "memory"
        }
      }
    }
  }
}

文档值格式
doc_values_format
　　default 默认，使用少量内存且性能良好
　　disk 将数据写入磁盘，几乎无需内存，使用场景：需要执行切面或排序操作而内存不足时
　　memory 切面或排序的功能与倒排索引字段的功能不相上下；由于这种数据结构存储于内存中，索引的刷新速度更快，而这对快速更改索引及缩短索引更新频率很有帮助

posted @ 2018-09-21 14:40 papering 阅读(408) 评论(0) 收藏举报

刷新页面返回顶部

papering

限制索引、字段的自动创建 无模式与自定义 索引结构映射 切面搜索

限制索引、字段的自动创建无模式与自定义索引结构映射切面搜索