es: ik中文分词ik_smart 和 ik_max_word 两种分词器的不同
一,ik_smart 和 ik_max_word 两种分词器
ik分词插件支持 ik_smart 和 ik_max_word 两种分词器
ik_smart - 粗粒度的分词
ik_max_word - 会尽可能的枚举可能的关键词,就是分词比较细致一些,会分解出更多的关键词
二,实际比较的例子:
ik_smart
$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "ik_smart",
"text": "这是一碗海鲜味方便面"
}
> '
{
"tokens" : [
{
"token" : "这是",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "一碗",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "海",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "鲜味",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "方便面",
"start_offset" : 7,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 4
}
]
}
ik_max_word
$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "ik_max_word",
"text": "这是一碗海鲜味方便面"
}'
{
"tokens" : [
{
"token" : "这是",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "一碗",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "一",
"start_offset" : 2,
"end_offset" : 3,
"type" : "TYPE_CNUM",
"position" : 2
},
{
"token" : "碗",
"start_offset" : 3,
"end_offset" : 4,
"type" : "COUNT",
"position" : 3
},
{
"token" : "海鲜",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "鲜味",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "方便面",
"start_offset" : 7,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "方便",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "面",
"start_offset" : 9,
"end_offset" : 10,
"type" : "CN_CHAR",
"position" : 8
}
]
}
浙公网安备 33010602011771号