下载IK
浏览器打开地址:https://github.com/medcl/elasticsearch-analysis-ik/releases,找到自己需要的版本,下载ZIP压缩包,我这边选择的是elasticsearch-analysis-ik-7.6.1.zip
安装
- 启动elasticsearch容器,进入plugins目录,创建ik文件夹
docker exec -it es /bin/bash
cd plugins/
mkdir ik
cd ik
- 把刚才下载的zip上传到ik目录
- 解压zip
unzip elasticsearch-analysis-ik-7.6.1.zip
- 删除zip文件
rm -f elasticsearch-analysis-ik-7.6.1.zip
- 退出es容器,并重新启动即可
验证分词
分词有ik_smart、ik_max_word两种方式,这边分别演示一遍
ik_smart:最粗粒度的拆分,每个字符只会被拆一次,通常搜索时选择的是ik_smart
GET _analyze
{
"analyzer": "ik_smart",
"text": "今天天气实在不错"
}
结果如下:
{
"tokens" : [
{
"token" : "今天天气",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "实在",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "不错",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 2
}
]
}
ik_max_word:最细粒度的拆分,会反复的拆出可能的单词
GET _analyze
{
"analyzer": "ik_max_word",
"text": "今天天气实在不错"
}
结果如下:
{
"tokens" : [
{
"token" : "今天天气",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "今天",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "天天",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "天气",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "实在",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "不错",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 5
}
]
}
浙公网安备 33010602011771号