es: ik中文分词ik_smart 和 ik_max_word 两种分词器的不同

一,ik_smart 和 ik_max_word 两种分词器

ik分词插件支持 ik_smart 和 ik_max_word 两种分词器

ik_smart - 粗粒度的分词
ik_max_word - 会尽可能的枚举可能的关键词,就是分词比较细致一些,会分解出更多的关键词

 

二,实际比较的例子:

ik_smart

$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "ik_smart",
  "text": "这是一碗海鲜味方便面"
}
> '
{
  "tokens" : [
    {
      "token" : "这是",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "一碗",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "海",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "鲜味",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "方便面",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

ik_max_word

$ curl -X GET "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "ik_max_word",
  "text": "这是一碗海鲜味方便面"
}'
{
  "tokens" : [
    {
      "token" : "这是",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "一碗",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "一",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "TYPE_CNUM",
      "position" : 2
    },
    {
      "token" : "碗",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "COUNT",
      "position" : 3
    },
    {
      "token" : "海鲜",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "鲜味",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "方便面",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "方便",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "面",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "CN_CHAR",
      "position" : 8
    }
  ]
}

 

posted @ 2025-12-17 22:17  刘宏缔的架构森林  阅读(3)  评论(0)    收藏  举报