Elasticsearch 使用指南

Elasticsearch 使用详细指南

1. 核心概念

1.1 基本术语

索引 (Index)

  • 文档的集合,类似于关系型数据库中的"数据库"
  • 索引名称必须小写,不能包含特殊字符

类型 (Type) (7.x以后已弃用)

  • 索引中文档的逻辑分类
  • Elasticsearch 7.x 默认为 _doc

文档 (Document)

  • 可被索引的基本信息单元,使用JSON格式表示
  • 每个文档都有唯一的ID

字段 (Field)

  • 文档中的属性,类似于关系型数据库中的"列"

映射 (Mapping)

  • 定义索引的结构,包含字段的数据类型、格式等信息

分片 (Shard)

  • 索引的子集,用于水平分割数据
  • 包含主分片和副本分片

1.2 数据类型

常见数据类型:

{
  "text": "全文搜索字段,会被分词",
  "keyword": "精确值字段,不分词",
  "long": "长整型",
  "integer": "整型",
  "short": "短整型",
  "byte": "字节",
  "double": "双精度浮点",
  "float": "单精度浮点",
  "date": "日期类型",
  "boolean": "布尔值",
  "binary": "二进制",
  "geo_point": "地理坐标点",
  "ip": "IP地址",
  "nested": "嵌套对象",
  "object": "JSON对象"
}

2. REST API 基础

2.1 常用HTTP方法

  • GET:获取数据
  • PUT:创建或更新数据
  • POST:创建数据或执行操作
  • DELETE:删除数据
  • HEAD:检查资源是否存在

2.2 基本URL格式

http://localhost:9200/<索引>/<类型>/<文档ID>

3. 索引操作

3.1 创建索引

基本创建:

# 创建索引,不指定映射
PUT /my_index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

创建带映射的索引:

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_analyzer",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "price": {
        "type": "double"
      },
      "description": {
        "type": "text"
      },
      "category": {
        "type": "keyword"
      },
      "stock": {
        "type": "integer"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

3.2 查看索引信息

# 查看索引设置
GET /my_index/_settings

# 查看索引映射
GET /my_index/_mapping

# 查看所有索引
GET /_cat/indices?v

# 查看索引统计信息
GET /my_index/_stats

3.3 修改索引

# 更新副本数
PUT /my_index/_settings
{
  "number_of_replicas": 2
}

# 关闭索引(禁止读写)
POST /my_index/_close

# 打开索引
POST /my_index/_open

# 刷新索引(使文档立即可搜索)
POST /my_index/_refresh

3.4 删除索引

# 删除单个索引
DELETE /my_index

# 删除多个索引
DELETE /index1,index2

# 删除所有索引(谨慎使用)
DELETE /_all
DELETE /*

4. 文档操作

4.1 创建文档

指定ID创建:

PUT /products/_doc/1
{
  "title": "iPhone 13 Pro",
  "price": 999.99,
  "description": "最新款苹果手机",
  "category": "electronics",
  "stock": 100,
  "created_at": "2023-10-01 10:00:00",
  "tags": ["phone", "apple", "smartphone"]
}

自动生成ID:

POST /products/_doc
{
  "title": "Samsung Galaxy S21",
  "price": 799.99,
  "description": "三星旗舰手机",
  "category": "electronics",
  "stock": 50,
  "created_at": "2023-10-02 14:30:00"
}

4.2 获取文档

获取单个文档:

# 获取指定文档
GET /products/_doc/1

# 获取指定字段
GET /products/_doc/1?_source=title,price

# 排除某些字段
GET /products/_doc/1?_source_excludes=description

# 只获取_source
GET /products/_source/1

批量获取:

GET /_mget
{
  "docs": [
    {
      "_index": "products",
      "_id": "1"
    },
    {
      "_index": "products",
      "_id": "2"
    }
  ]
}

4.3 更新文档

完全替换:

PUT /products/_doc/1
{
  "title": "iPhone 13 Pro Updated",
  "price": 899.99,
  "description": "降价促销",
  "category": "electronics",
  "stock": 80,
  "created_at": "2023-10-01 10:00:00"
}

部分更新:

POST /products/_update/1
{
  "doc": {
    "price": 899.99,
    "stock": 80
  }
}

脚本更新:

POST /products/_update/1
{
  "script": {
    "source": "ctx._source.stock += params.quantity",
    "params": {
      "quantity": 10
    }
  }
}

upsert操作(不存在时创建):

POST /products/_update/3
{
  "script": {
    "source": "ctx._source.stock += params.quantity",
    "params": {
      "quantity": 5
    }
  },
  "upsert": {
    "title": "New Product",
    "price": 199.99,
    "stock": 5,
    "category": "electronics"
  }
}

4.4 删除文档

# 删除单个文档
DELETE /products/_doc/1

# 按查询删除
POST /products/_delete_by_query
{
  "query": {
    "match": {
      "category": "old_products"
    }
  }
}

4.5 批量操作

POST /_bulk
{ "index" : { "_index" : "products", "_id" : "10" } }
{ "title": "Product 10", "price": 100, "category": "A" }
{ "create" : { "_index" : "products", "_id" : "11" } }
{ "title": "Product 11", "price": 200, "category": "B" }
{ "update" : { "_index" : "products", "_id" : "10" } }
{ "doc" : { "price": 150 } }
{ "delete" : { "_index" : "products", "_id" : "11" } }

5. 搜索操作

5.1 基本搜索

搜索所有文档:

GET /products/_search
{
  "query": {
    "match_all": {}
  }
}

分页搜索:

GET /products/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match_all": {}
  }
}

排序:

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    },
    {
      "stock": {
        "order": "asc"
      }
    }
  ]
}

5.2 查询类型

匹配查询 (Match Query):

GET /products/_search
{
  "query": {
    "match": {
      "title": "iPhone 13"
    }
  }
}

多字段匹配:

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "苹果手机",
      "fields": ["title", "description"]
    }
  }
}

精确匹配 (Term Query):

GET /products/_search
{
  "query": {
    "term": {
      "category.keyword": "electronics"
    }
  }
}

范围查询 (Range Query):

GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 500,
        "lte": 1000
      }
    }
  }
}

布尔查询 (Bool Query):

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "手机"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "price": {
              "gte": 500
            }
          }
        },
        {
          "term": {
            "category.keyword": "electronics"
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "title.keyword": "旧款手机"
          }
        }
      ],
      "should": [
        {
          "match": {
            "description": "最新"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

通配符查询 (Wildcard Query):

GET /products/_search
{
  "query": {
    "wildcard": {
      "title.keyword": "*Phone*"
    }
  }
}

前缀查询 (Prefix Query):

GET /products/_search
{
  "query": {
    "prefix": {
      "title.keyword": "iPhone"
    }
  }
}

正则表达式查询:

GET /products/_search
{
  "query": {
    "regexp": {
      "title.keyword": "iP.*e.*"
    }
  }
}

5.3 聚合查询

指标聚合:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    },
    "max_price": {
      "max": {
        "field": "price"
      }
    },
    "min_price": {
      "min": {
        "field": "price"
      }
    },
    "sum_stock": {
      "sum": {
        "field": "stock"
      }
    },
    "total_products": {
      "value_count": {
        "field": "title.keyword"
      }
    }
  }
}

桶聚合:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category.keyword",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

范围聚合:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500, "to": 1000 },
          { "from": 1000 }
        ]
      }
    }
  }
}

日期直方图聚合:

GET /products/_search
{
  "size": 0,
  "query": {
    "range": {
      "created_at": {
        "gte": "now-30d/d",
        "lte": "now/d"
      }
    }
  },
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "format": "yyyy-MM-dd"
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

5.4 高亮显示

GET /products/_search
{
  "query": {
    "match": {
      "description": "手机"
    }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"]
      }
    }
  }
}

5.5 搜索建议器

补全建议器:

GET /products/_search
{
  "suggest": {
    "product_suggest": {
      "prefix": "iPh",
      "completion": {
        "field": "suggest",
        "skip_duplicates": true
      }
    }
  }
}

拼写检查:

GET /products/_search
{
  "suggest": {
    "text": "appel phone",
    "my_suggestion": {
      "term": {
        "field": "title"
      }
    }
  }
}

6. 高级功能

6.1 别名管理

创建别名:

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "products",
        "alias": "latest_products"
      }
    }
  ]
}

带过滤条件的别名:

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "products",
        "alias": "in_stock_products",
        "filter": {
          "range": {
            "stock": {
              "gt": 0
            }
          }
        }
      }
    }
  ]
}

6.2 索引模板

PUT /_template/product_template
{
  "index_patterns": ["product-*"],
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "price": {
        "type": "double"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

6.3 生命周期管理 (ILM)

创建ILM策略:

PUT /_ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "30d"
          }
        }
      },
      "warm": {
        "min_age": "60d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "90d",
        "actions": {
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "365d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

6.4 快照和恢复

创建仓库:

PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/path/to/backup",
    "compress": true
  }
}

创建快照:

PUT /_snapshot/my_backup/snapshot_1
{
  "indices": "products,orders",
  "ignore_unavailable": true,
  "include_global_state": false
}

恢复快照:

POST /_snapshot/my_backup/snapshot_1/_restore
{
  "indices": "products",
  "rename_pattern": "products",
  "rename_replacement": "restored_products"
}

7. 监控和管理

7.1 集群健康

# 集群健康状态
GET /_cluster/health

# 详细健康信息
GET /_cluster/health?level=shards

# 等待集群状态变为yellow或green
GET /_cluster/health?wait_for_status=yellow&timeout=50s

7.2 节点信息

# 节点统计
GET /_nodes/stats

# 节点信息
GET /_nodes

# 热线程
GET /_nodes/hot_threads

7.3 索引管理

# 清理缓存
POST /products/_cache/clear

# 强制合并
POST /products/_forcemerge?max_num_segments=1

# 刷新索引
POST /products/_refresh

8. 性能优化技巧

8.1 查询优化

使用filter代替query:

GET /products/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "category.keyword": "electronics"
          }
        }
      ]
    }
  }
}

分页深度优化:

# 使用search_after代替from/size进行深度分页
GET /products/_search
{
  "size": 10,
  "sort": [
    {
      "created_at": "desc"
    },
    {
      "_id": "asc"
    }
  ],
  "search_after": ["2023-10-01", "product_id"]
}

8.2 索引优化

合理设置分片数:

  • 每个分片建议20-50GB
  • 每个节点分片数不超过20-30个

使用索引模板:

PUT /_template/optimized_template
{
  "index_patterns": ["logs-*"],
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "translog": {
      "flush_threshold_size": "1gb",
      "sync_interval": "30s",
      "durability": "async"
    }
  }
}

8.3 批量操作优化

# 使用批量API,批量大小建议5-15MB
POST /_bulk
{"index":{"_index":"products"}}
{"title":"Product 1","price":100}
{"index":{"_index":"products"}}
{"title":"Product 2","price":200}

9. 安全实践

9.1 访问控制

# 使用API密钥
curl -H "Authorization: ApiKey <your-api-key>" http://localhost:9200/

# 使用基本认证
curl -u username:password http://localhost:9200/

9.2 索引权限

# 限制索引访问
# 在Elasticsearch安全配置中设置角色
{
  "indices": [
    {
      "names": ["products-*"],
      "privileges": ["read", "write"]
    }
  ]
}

10. 故障排查

10.1 常见错误处理

索引只读问题:

# 检查磁盘空间
GET /_cluster/allocation/explain

# 修改索引为可写
PUT /products/_settings
{
  "index.blocks.read_only_allow_delete": null
}

分片分配问题:

# 查看分片分配解释
GET /_cluster/allocation/explain

# 手动移动分片
POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "products",
        "shard": 0,
        "from_node": "node1",
        "to_node": "node2"
      }
    }
  ]
}

10.2 日志分析

# 查看慢查询日志
GET /products/_search?q=test&profile=true

# 启用索引慢查询日志
PUT /products/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s"
}

11. 最佳实践

  1. 索引设计
    • 合理设置分片数量和副本数量
    • 使用合适的映射和数据类型
    • 避免索引过多的小文档
  2. 查询优化
    • 使用filter context进行过滤
    • 避免深度分页
    • 合理使用缓存
  3. 监控维护
    • 定期监控集群状态
    • 设置索引生命周期策略
    • 定期备份重要数据
  4. 性能调优
    • 根据硬件配置调整JVM堆大小
    • 合理设置线程池大小
    • 使用SSD硬盘提升IO性能
posted @ 2025-12-25 01:52  binlicoder  阅读(8)  评论(0)    收藏  举报