26 function_score自定义相关度分数算法

  自定义一个function_score函数,将某个field的值,跟es内置算出来的分数进行运算,然后由自己指定的field来进行分数的增强

  需求: 看帖子的人越多,那么帖子的分数就越高

  先给所有的帖子数据增加follower数量 , 将对帖子搜索得到的分数,跟follower_num进行运算,由follower_num在一定程度上增强帖子的分数,看帖子的人越多,那么帖子的分数就越高

POST /forum/_bulk
{"update":{"_id":"1"}}
{"doc":{"follower_num":5}}
{"update":{"_id":"2"}}
{"doc":{"follower_num":10}}
{"update":{"_id":"3"}}
{"doc":{"follower_num":25}}
{"update":{"_id":"4"}}
{"doc":{"follower_num":3}}
{"update":{"_id":"5"}}
{"doc":{"follower_num":60}}

  查询

GET /forum/_search
{
  "query":{
    "function_score": {
      "query": {
        "multi_match": {
          "query": "java spark",
          "fields": ["title","content"]
        }
      },
      "field_value_factor": {
        "field": "follower_num",
        "modifier": "log1p",
        "factor": 0.5
        }, 
      "boost_mode": "sum",
      "max_boost": 5
    }
  }
}
  • 如果只有field,那么会将每个doc的分数都乘以follower_num,如果有的doc follower是0,那么分数就会变为0,效果很不好。

  • 因此一般会加个log1p函数,公式会变为,new_score = old_score * log(1 + number_of_votes),这样出来的分数会比较合理 。

  再加个factor,可以进一步影响分数,new_score = old_score * log(1 + factor * number_of_votes)

  • boost_mode,可以决定分数与指定字段的值如何计算 : multiply,replace, sum,min,max,avg

  max_boost,限制计算出来的分数不要超过max_boost指定的值

 

{
  "took" : 243,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.1746066,
    "hits" : [
      {
        "_index" : "forum",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 2.1746066,
        "_source" : {
          "content" : "spark is best big data solution based on scala ,an programming language similar to java spark",
          "hidden" : false,
          "articleID" : "XHDK-A-1293-#fJ3",
          "postDate" : "2017-01-01",
          "userID" : 1,
          "follower_num" : 60
        }
      }
    ]
  }
}

 

posted on 2020-12-16 22:58  溪水静幽  阅读(117)  评论(0)    收藏  举报