26 function_score自定义相关度分数算法
自定义一个function_score函数,将某个field的值,跟es内置算出来的分数进行运算,然后由自己指定的field来进行分数的增强
需求: 看帖子的人越多,那么帖子的分数就越高
先给所有的帖子数据增加follower数量 , 将对帖子搜索得到的分数,跟follower_num进行运算,由follower_num在一定程度上增强帖子的分数,看帖子的人越多,那么帖子的分数就越高
POST /forum/_bulk {"update":{"_id":"1"}} {"doc":{"follower_num":5}} {"update":{"_id":"2"}} {"doc":{"follower_num":10}} {"update":{"_id":"3"}} {"doc":{"follower_num":25}} {"update":{"_id":"4"}} {"doc":{"follower_num":3}} {"update":{"_id":"5"}} {"doc":{"follower_num":60}}
查询
GET /forum/_search { "query":{ "function_score": { "query": { "multi_match": { "query": "java spark", "fields": ["title","content"] } }, "field_value_factor": { "field": "follower_num", "modifier": "log1p", "factor": 0.5 }, "boost_mode": "sum", "max_boost": 5 } } }
-
如果只有field,那么会将每个doc的分数都乘以follower_num,如果有的doc follower是0,那么分数就会变为0,效果很不好。
-
因此一般会加个log1p函数,公式会变为,
new_score = old_score * log(1 + number_of_votes),这样出来的分数会比较合理 。
再加个factor,可以进一步影响分数,new_score = old_score * log(1 + factor * number_of_votes)
- boost_mode,可以决定分数与指定字段的值如何计算 : multiply,replace, sum,min,max,avg
max_boost,限制计算出来的分数不要超过max_boost指定的值
{ "took" : 243, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 2.1746066, "hits" : [ { "_index" : "forum", "_type" : "_doc", "_id" : "5", "_score" : 2.1746066, "_source" : { "content" : "spark is best big data solution based on scala ,an programming language similar to java spark", "hidden" : false, "articleID" : "XHDK-A-1293-#fJ3", "postDate" : "2017-01-01", "userID" : 1, "follower_num" : 60 } } ] } }
立志如山 静心求实
浙公网安备 33010602011771号