Elasticsearch索引与文档管理API实战指南:生产环境最佳实践

  • Elasticsearch集群架构图

一、索引管理实战

1. 索引创建策略

基础索引创建(生产环境推荐配置)

PUT http://10.0.0.91:9200/prod-logs-2024
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

生产建议

  • 分片数根据数据量预估(单个分片建议30-50GB)
  • 副本数根据集群节点数设置(至少1个保证高可用)
  • 使用日期后缀命名便于生命周期管理

自定义分片配置

PUT http://10.0.0.91:9200/transaction-records
{
  "settings": {
    "number_of_shards": 10,
    "number_of_replicas": 2
  }
}

2. 索引查看与监控

查看索引健康状态

GET /_cat/indices?v&health=yellow

查看分片分布

GET /_cat/shards/oldboyedu-linux92

3. 索引动态调整

副本数热更新

PUT /oldboyedu-linux92/_settings
{
  "index" : {
    "number_of_replicas" : 2
  }
}

索引别名管理

POST /_aliases
{
  "actions" : [
    { "add" : { "index" : "oldboyedu-linux92", "alias" : "current_logs" } }
  ]
}

4. 索引删除保护

PUT /_cluster/settings
{
  "persistent": {
    "action.destructive_requires_name": true
  }
}

二、文档管理实战

1. 文档CRUD操作

安全创建文档(自动ID生成)

POST /orders/_doc
{
  "order_id": "20240715-001",
  "amount": 2999.00,
  "products": ["手机", "保护壳"]
}

条件更新文档

POST /orders/_update/1
{
  "script" : {
    "source": "ctx._source.quantity += params.quantity",
    "lang": "painless",
    "params" : {
      "quantity" : 4
    }
  }
}

2. 批量操作API

高效批量写入

POST /_bulk
{ "index" : { "_index" : "user_behavior", "_id" : "20240715001" } }
{ "user_id": 1001, "action": "click", "timestamp": "2024-07-15T14:30:00" }
{ "create" : { "_index" : "user_behavior", "_id" : "20240715002" } }
{ "user_id": 1002, "action": "purchase", "amount": 150 }

智能批量读取

GET /_msearch
{"index" : "user_behavior"}
{"query" : {"match" : {"action" : "click"}}}
{"index" : "orders"}
{"query" : {"range" : {"amount" : {"gte" : 1000}}}}

3. 生产环境优化建议

  1. 索引设计规范

    • 使用模板管理索引设置
    PUT /_index_template/logs_template
    {
      "index_patterns": ["logs-*"],
      "template": {
        "settings": {
          "number_of_shards": 5,
          "codec": "best_compression"
        }
      }
    }
    
  2. 写入性能优化

    • 批量文档大小控制在5-15MB
    • 使用自动生成的文档ID
    • 适当增加refresh_interval
    PUT /logs/_settings
    {
      "index" : {
        "refresh_interval" : "30s"
      }
    }
    
  3. 查询优化技巧

    • 使用_filter代替query进行过滤
    • 避免深度分页(推荐使用search_after)
    • 合理使用字段映射
    PUT /products
    {
      "mappings": {
        "properties": {
          "price": { "type": "scaled_float", "scaling_factor": 100 }
        }
      }
    }
    

三、实战练习:电商数据分析

1. 创建电商索引

PUT /ecommerce-products
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "title": { "type": "text", "analyzer": "ik_max_word" },
      "price": { "type": "scaled_float", "scaling_factor": 100 },
      "category": { "type": "keyword" },
      "attributes": { "type": "nested" }
    }
  }
}

2. 批量导入数据

POST /_bulk
{"index":{"_index":"ecommerce-products"}}
{"title":"智能手机 X1 Pro","price":3999.00,"category":"电子产品","brand":"X-Mobile"}
{"index":{"_index":"ecommerce-products"}}
{"title":"智能手表 S2","price":899.00,"category":"穿戴设备","brand":"TechLife"}

3. 复杂查询示例

GET /ecommerce-products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "智能" } },
        { "range": { "price": { "gte": 1000 } } }
      ]
    }
  },
  "aggs": {
    "category_stats": {
      "terms": { "field": "category" },
      "aggs": { "avg_price": { "avg": { "field": "price" } } }
    }
  }
}

四、安全与维护

1. 索引保护策略

PUT /_snapshot/my_backup/snapshot_1
{
  "indices": "important_data",
  "ignore_unavailable": true,
  "include_global_state": false
}

2. 权限控制示例

# elasticsearch.yml配置
xpack.security.authc:
  realms:
    native:
      native1:
        order: 0

附录:JSON基础速查表

数据类型 示例
字符串 "服务器日志"
数值 2024, 3.1415
布尔值 true, false
数组 ["错误", 500, true]
对象 {"error": {"code": 500}}

Elasticsearch数据流图

最佳实践提示:定期使用_cat/indices?v监控索引状态,结合ILM策略实现自动化的索引生命周期管理。

posted on 2025-03-26 09:45  Leo_Yide  阅读(73)  评论(0)    收藏  举报