ElasticSearch文档分值_score计算底层原理

1）boolean model：根据用户的query条件，先过滤出包含指定term的doc

query "hello world" -->  hello / world / hello & world

bool --> must/must not/should --> 过滤 --> 包含 / 不包含 / 可能包含

doc --> 不打分数 --> 正或反 true or false --> 为了减少后续要计算的doc的数量，提升性能

2) relevance score算法，简单来说，就是计算出，一个索引中的文本，与搜索文本，他们之间的关联匹配程度

Elasticsearch使用的是 term frequency/inverse document frequency算法，简称为TF/IDF算法

Term frequency：搜索文本中的各个词条在field文本中出现了多少次，出现次数越多，就越相关

搜索请求：hello world

doc1：hello you, and world is very good

doc2：hello, how are you

Inverse document frequency：搜索文本中的各个词条在整个索引的所有文档中出现了多少次，出现的次数越多，就越不相关

搜索请求：hello world

doc1：hello, tuling is very good

doc2：hi world, how are you

比如说，在index中有1万条document，hello这个单词在所有的document中，一共出现了1000次；world这个单词在所有的document中，一共出现了100次

Field-length norm：field长度，field越长，相关度越弱

搜索请求：hello world

doc1：{ "title": "hello article", "content": "...... N个单词" }

doc2：{ "title": "my article", "content": "...... N个单词，hi world" }

hello world在整个index中出现的次数是一样多的

doc1更相关，title field更短

2、分析一个document上的_score是如何被计算出来的

GET /es_db/_doc/1/_explain
{
  "query": {
    "match": {
      "remark": "java developer"
    }
  }
}

posted @ 2022-04-10 19:06 VNone 阅读(231) 评论(0) 收藏举报

刷新页面返回顶部

VNone

ElasticSearch文档分值_score计算底层原理

公告