es自动完成功能升级

 

自动完成,功能不能使用Suggester的phrase和term,英文或者拼音纠错可以,但是中文表现不行


 


 


es中的 completion Suggester 提供了自动完成功能


 


是直接通过索引内容和FST一起存放在  .tip文件中;去加载到目标内存中,所以响应的速度更快;FST非常适用于前缀查找。


如果是phrase或者term 则需要分解为token后,去字典里进行查找相对效率会稍差些。


 


但我们需要指定类型为: type: completion   在查询过程中使用suggest查询,来搞定“自动完成”功能,但只能是前缀搜索。 如果需要限制类别,可以使用添加分类字段来指定类别,但是input集合中分类词难维护,不建议使用。


 


如果没有结果召回,我们可以使用分词获取token后,去匹配类型,然后根据类型搜索,按照热度,倒排序,然后执行返回top20。


 




POST _analyze {
"text": [ "Lucene is cool", "Elasticsearch builds on top of lucene", "Elasticsearch rocks", "Elastic is the company behind ELK stack", "elk rocks", "elasticsearch is rock solid" ] } PUT /blogs_completion/ { "mappings": { "tech": { "properties": { "body": { "type": "completion" }, "body1": { "type": "text", "analyzer":"ik_smart" } } } } } DELETE /blogs_completion/ POST _bulk/?refresh=true { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Lucene is cool","body1": "Lucene is cool"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elasticsearch builds on top of lucene","body1":"Elasticsearch builds on top of lucene"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elasticsearch rocks","body1":"Elasticsearch rocks"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elastic is the company behind ELK stack","body1":"Elastic is the company behind ELK stack"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "the elk stack rocks","body1":"the elk stack rocks"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "elasticsearch is rock solid","body1":"elasticsearch is rock solid"} POST blogs_completion/_search?pretty { "size": 0, "suggest": { "blog-suggest": { "prefix": "Elasticsearc b", "completion": { "field": "body" } } } } POST /blogs_completion/_search?pretty { "suggest": { "blog-suggest": { "text": "biuds", "term": { "suggest_mode": "missing", "field": "body1" } } } } POST blogs_completion/_search { "size": 0, "suggest": { "blog-suggest": { "text": "Elastcserch rock", "phrase": { "field": "body1" } } } }

 

 

 

使用es搞定自动完成功能,使用es提供的suggested方式,suggested支持三种匹配模式:
index设置mapping时:检索精准度 completion>phrase>term

completion模式需要设置对应字段type为:completion
phrase模式和term模式需要设置对应字段type为:text

completion直接返回的option数组结果中是根据左前缀匹配出来的;
phrase在涉及的文档中会做词组的匹配;
term会针对单个词的纠错匹配;(实现方式为Levenstein edit distance,在一定范围内移动字符能匹配就可以作为结果返回)
结果召回率上:completion<phrase<term

 

所以在自动完成功能中要有完整的方案,如果没有匹配项,应该使用term分词后的纠错匹配来增加数据召回率。

 

posted @ 2020-10-13 17:54  soft.push("zzq")  Views(172)  Comments(0Edit  收藏  举报