Explanation---lucene中对于评分细节描述的类
一般通过IndexSearch.explain(query,docId)----》weight.explain(reader, doc) 方法得到一个文档的评分的具体信息 。
Explanation的信息如下:
4.803122 = (MATCH) fieldWeight(keywords:奶粉 in 457), product of:
2.0 = tf(termFreq(keywords:奶粉)=4)
4.803122 = idf(docFreq=414, maxDocs=18609)
0.5 = fieldNorm(field=keywords, doc=457)
第一行表示总得分:document(docId=457)的score为4.803122,它由下面的三个值相乘得来
第二行表示项频率:在document(docId=457)keywords这个filed中共出现了4个“奶粉”关键词,2.0是由根号4算出来的。
第三行表示反转文档频率:含有“奶粉”关键词的document共有414个,总的document有18609个,4.803122是由
ln(18609/(414+ 1) )+ 1.0 =ln(18609) -ln(415) +1 = 9.831400 - 6.028278 = 4.803122
第三行表示域的加权长度因子:fieldNorm = fieldboost / sqrt(fieldlength),其中fieldlength为keywords这个field的token(分词)数量 。
对于Weight的一个实现TermWeight中的weight.explain(reader, doc)实现如下 :
public Explanation explain(IndexReader reader, int doc)
throws IOException {
ComplexExplanation result = new ComplexExplanation();
result.setDescription("weight("+getQuery()+" in "+doc+"), product of:");
Explanation idfExpl = new Explanation(idf, "idf(docFreq=" + reader.docFreq(term) + ")");
// explain query weight
Explanation queryExpl = new Explanation();
queryExpl.setDescription("queryWeight(" + getQuery() + "), product of:");
Explanation boostExpl = new Explanation(getBoost(), "boost");
if (getBoost() != 1.0f)
queryExpl.addDetail(boostExpl);
queryExpl.addDetail(idfExpl);
Explanation queryNormExpl = new Explanation(queryNorm,"queryNorm");
queryExpl.addDetail(queryNormExpl);
queryExpl.setValue(boostExpl.getValue() *idfExpl.getValue() *queryNormExpl.getValue());
result.addDetail(queryExpl);
// 说明Field的权重
String field = term.field();
ComplexExplanation fieldExpl = new ComplexExplanation();
fieldExpl.setDescription("fieldWeight("+term+" in "+doc+"), product of:");
Explanation tfExpl = scorer(reader).explain(doc);
fieldExpl.addDetail(tfExpl);
fieldExpl.addDetail(idfExpl);
Explanation fieldNormExpl = new Explanation();
byte[] fieldNorms = reader.norms(field);
float fieldNorm =
fieldNorms!=null ? Similarity.decodeNorm(fieldNorms[doc]) : 0.0f;
fieldNormExpl.setValue(fieldNorm);
fieldNormExpl.setDescription("fieldNorm(field="+field+", doc="+doc+")");
fieldExpl.addDetail(fieldNormExpl);
fieldExpl.setMatch(Boolean.valueOf(tfExpl.isMatch()));
fieldExpl.setValue(tfExpl.getValue() *idfExpl.getValue() *fieldNormExpl.getValue());
result.addDetail(fieldExpl);
result.setMatch(fieldExpl.getMatch());
// combine them
result.setValue(queryExpl.getValue() * fieldExpl.getValue());
if (queryExpl.getValue() == 1.0f)
return fieldExpl;
return result;
}
参考文章:
http://www.cnblogs.com/lvpei/articles/1732475.html
http://archive.cnblogs.com/a/1906353/
http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Weight.html