Docker环境给Elasticsearch安装IK分词

下载IK

浏览器打开地址：https://github.com/medcl/elasticsearch-analysis-ik/releases，找到自己需要的版本，下载ZIP压缩包，我这边选择的是elasticsearch-analysis-ik-7.6.1.zip

安装

启动elasticsearch容器，进入plugins目录，创建ik文件夹

docker exec -it es /bin/bash
cd plugins/
mkdir ik
cd ik

把刚才下载的zip上传到ik目录
解压zip

unzip elasticsearch-analysis-ik-7.6.1.zip

删除zip文件

rm -f elasticsearch-analysis-ik-7.6.1.zip

退出es容器，并重新启动即可

验证分词

分词有ik_smart、ik_max_word两种方式，这边分别演示一遍

ik_smart：最粗粒度的拆分，每个字符只会被拆一次，通常搜索时选择的是ik_smart

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "今天天气实在不错"
}

结果如下：

{
  "tokens" : [
    {
      "token" : "今天天气",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "实在",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "不错",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

ik_max_word：最细粒度的拆分，会反复的拆出可能的单词

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "今天天气实在不错"
}

结果如下：

{
  "tokens" : [
    {
      "token" : "今天天气",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "今天",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "天天",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "天气",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "实在",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "不错",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}

posted on 2020-08-20 14:00 风停了，雨来了阅读(391) 评论(0) 收藏举报