ES 安装IK分析器

ES支持以插件形式,热插拔需要的插件。对于中文分词器,我们这边选用IK分词器,下边来看下基于Docker形式怎么安装IK分析器插件

下载IK分析器

IK分词器在github上有大神以开源,直接拉取压缩包就可以了。注意:IK分析器必须要和ES版本保持一致
下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases,找到对应的版本即可。

Linux系统下,解压缩

cd es_kibana
mkdir plugin
rz elasticsearch-analysis-ik-7.1.0.zip
sudo unzip elasticsearch-analysis-ik-7.1.0.zip -d ./plugin

修改docker-compose.yml文件,挂载插件目录到容器中

version: "3.8"
volumes:
  data:
  config:
  plugin:
networks:
  es:
services:
  kibana:
    image: kibana:7.1.0
    ports:
      - "5601:5601"
    networks:
      - "es"
    volumes:
      - ./kibana.yml:/usr/share/kibana/config/kibana.yml
  elasticsearch:
    image: elasticsearch:7.1.0
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.type=single-node"
    volumes:
      - data:/usr/share/elasticsearch/data
      - config:/usr/share/elasticsearch/config
      - ./plugin:/usr/share/elasticsearch/plugins
    ports:
      - "9200:9200"
      - "9500:9300"
    networks:
      - "es"

重启验证

sudo docker compose down
sudo docker compose up -d

## 访问kibana调试
POST /_analyze
{
  "analyzer": "ik_smart",
  "text":"中华人民共和国国歌"
}

## 输出结果
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}


POST /_analyze
{
  "analyzer": "ik_max_word",
  "text":"中华人民共和国国歌"
}


##输出结果
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 9
    }
  ]
}
posted @ 2022-10-16 11:52  Tenic  阅读(265)  评论(0编辑  收藏  举报