15 使用copy_to定制组合field解决cross-fields搜索弊端
新增数据
PUT /forum/_mapping { "properties": { "new_author_first_name": { "type": "text", "copy_to": "new_author_full_name" }, "new_author_last_name": { "type": "text", "copy_to": "new_author_full_name" }, "new_author_full_name": { "type": "text" } } }
更新数据
POST /forum/article/_bulk { "update": { "_id": "1"} } { "doc" : {"new_author_first_name" : "Peter", "new_author_last_name" : "Smith"} } { "update": { "_id": "2"} } { "doc" : {"new_author_first_name" : "Smith", "new_author_last_name" : "Williams"} } { "update": { "_id": "3"} } { "doc" : {"new_author_first_name" : "Jack", "new_author_last_name" : "Ma"} } { "update": { "_id": "4"} } { "doc" : {"new_author_first_name" : "Robbin", "new_author_last_name" : "Li"} } { "update": { "_id": "5"} } { "doc" : {"new_author_first_name" : "Tonny", "new_author_last_name" : "Peter Smith"} }
查询
GET /forum/_search { "query": { "match": { "new_author_full_name":"Peter Smith" } } }
结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.89712, "hits" : [ { "_index" : "forum", "_type" : "_doc", "_id" : "1", "_score" : 1.89712, "_source" : { "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden" : false, "postDate" : "2017-01-01", "sub_title" : "learning more courses", "author_first_name" : "Peter", "author_last_name" : "Smith", "new_author_last_name" : "Smith", "new_author_first_name" : "Peter" } }, { "_index" : "forum", "_type" : "_doc", "_id" : "2", "_score" : 0.6931472, "_source" : { "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden" : false, "postDate" : "2017-01-02", "sub_title" : "learned a lot of course", "author_first_name" : "Smith", "author_last_name" : "Williams", "new_author_last_name" : "Williams", "new_author_first_name" : "Smith" } } ] } }
-
问题1:只是找到尽可能多的field匹配的doc,而不是某个field完全匹配的doc
答: 解决,最匹配的document被最先返回
-
问题2:most_fields,没办法用minimum_should_match去掉长尾数据,就是匹配的特别少的结果
答: 解决,可以使用minimum_should_match去掉长尾数据
-
问题3:TF/IDF算法,比如Peter Smith和Smith Williams,搜索Peter
Smith的时候,由于first_name中很少有Smith的,所以query在所有document中的频率很低,得到的分数很高,可能Smith Williams反而会排在Peter Smith前面答: 解决,Smith和Peter在一个field了,所以在所有document中出现的次数是均匀的,不会有极端的偏差
立志如山 静心求实
浙公网安备 33010602011771号