Lucene搜索过程解析(4)
2012-12-13 15:49
239 查看
以下转自:http://forfuture1978.iteye.com/blog/632829
2.4、搜索查询对象
2.4.1.2、创建Weight对象树
BooleanQuery.createWeight(Searcher) 最终返回return new BooleanWeight(searcher),BooleanWeight构造函数的具体实现如下:
对于TermQuery的叶子节点,其TermQuery.createWeight(Searcher) 返回return new TermWeight(searcher)对象,TermWeight构造函数如下:
//idf的计算完全符合文档中的公式:
而ConstantScoreQuery.createWeight(Searcher) 除了创建ConstantScoreQuery.ConstantWeight(searcher)对象外,没有计算idf。
由此创建的Weight对象树如下:
2.4.1.3、计算Term Weight分数
(1) 首先计算sumOfSquaredWeights
按照公式:
代码如下:
float sum = weight.sumOfSquaredWeights();
//可以看出,也是一个递归的过程
对于叶子节点TermWeight来讲,其TermQuery$TermWeight.sumOfSquaredWeights()实现如下:
对于叶子节点ConstantWeight来讲,其ConstantScoreQuery$ConstantWeight.sumOfSquaredWeights() 如下:
(2) 计算queryNorm
其公式如下:
其代码如下:
(3) 将queryNorm算入打分
代码为:
weight.normalize(norm);
其叶子节点TermWeight来讲,其TermQuery$TermWeight.normalize(float) 代码如下:
我们知道,Lucene的打分公式整体如下,到此计算了图中,红色的部分:
2.4、搜索查询对象
2.4.1.2、创建Weight对象树
BooleanQuery.createWeight(Searcher) 最终返回return new BooleanWeight(searcher),BooleanWeight构造函数的具体实现如下:
public BooleanWeight(Searcher searcher) { this.similarity = getSimilarity(searcher); weights = new ArrayList<Weight>(clauses.size()); //也是一个递归的过程,沿着新的Query对象树一直到叶子节点 for (int i = 0 ; i < clauses.size(); i++) { weights.add(clauses.get(i).getQuery().createWeight(searcher)); } }
对于TermQuery的叶子节点,其TermQuery.createWeight(Searcher) 返回return new TermWeight(searcher)对象,TermWeight构造函数如下:
public TermWeight(Searcher searcher) { this.similarity = getSimilarity(searcher); //此处计算了idf idfExp = similarity.idfExplain(term, searcher); idf = idfExp.getIdf(); }
//idf的计算完全符合文档中的公式:
//idf的计算完全符合文档中的公式: public IDFExplanation idfExplain(final Term term, final Searcher searcher) { final int df = searcher.docFreq(term); final int max = searcher.maxDoc(); final float idf = idf(df, max); return new IDFExplanation() { public float getIdf() { return idf; }}; } public float idf(int docFreq, int numDocs) { return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0); }
而ConstantScoreQuery.createWeight(Searcher) 除了创建ConstantScoreQuery.ConstantWeight(searcher)对象外,没有计算idf。
由此创建的Weight对象树如下:
weight BooleanQuery$BooleanWeight (id=169) | similarity DefaultSimilarity (id=177) | this$0 BooleanQuery (id=89) | weights ArrayList<E> (id=188) | elementData Object[3] (id=190) |------[0] BooleanQuery$BooleanWeight (id=171) | | similarity DefaultSimilarity (id=177) | | this$0 BooleanQuery (id=105) | | weights ArrayList<E> (id=193) | | elementData Object[2] (id=199) | |------[0] ConstantScoreQuery$ConstantWeight (id=183) | | queryNorm 0.0 | | queryWeight 0.0 | | similarity DefaultSimilarity (id=177) | | //ConstantScore(contents:apple*) | | this$0 ConstantScoreQuery (id=123) | |------[1] TermQuery$TermWeight (id=175) | idf 2.0986123 | idfExp Similarity$1 (id=241) | queryNorm 0.0 | queryWeight 0.0 | similarity DefaultSimilarity (id=177) | //contents:boy | this$0 TermQuery (id=124) | value 0.0 | modCount 2 | size 2 |------[1] BooleanQuery$BooleanWeight (id=179) | | similarity DefaultSimilarity (id=177) | | this$0 BooleanQuery (id=110) | | weights ArrayList<E> (id=195) | | elementData Object[2] (id=204) | |------[0] ConstantScoreQuery$ConstantWeight (id=206) | | queryNorm 0.0 | | queryWeight 0.0 | | similarity DefaultSimilarity (id=177) | | //ConstantScore(contents:cat*) | | this$0 ConstantScoreQuery (id=135) | |------[1] TermQuery$TermWeight (id=207) | idf 1.5389965 | idfExp Similarity$1 (id=210) | queryNorm 0.0 | queryWeight 0.0 | similarity DefaultSimilarity (id=177) | //contents:dog | this$0 TermQuery (id=136) | value 0.0 | modCount 2 | size 2 |------[2] BooleanQuery$BooleanWeight (id=182) | similarity DefaultSimilarity (id=177) | this$0 BooleanQuery (id=113) | weights ArrayList<E> (id=197) | elementData Object[2] (id=216) |------[0] BooleanQuery$BooleanWeight (id=181) | | similarity BooleanQuery$1 (id=220) | | this$0 BooleanQuery (id=145) | | weights ArrayList<E> (id=221) | | elementData Object[2] (id=224) | |------[0] TermQuery$TermWeight (id=226) | | idf 2.0986123 | | idfExp Similarity$1 (id=229) | | queryNorm 0.0 | | queryWeight 0.0 | | similarity DefaultSimilarity (id=177) | | //contents:eat | | this$0 TermQuery (id=150) | | value 0.0 | |------[1] TermQuery$TermWeight (id=227) | idf 1.1823215 | idfExp Similarity$1 (id=231) | queryNorm 0.0 | queryWeight 0.0 | similarity DefaultSimilarity (id=177) | //contents:cat^0.33333325 | this$0 TermQuery (id=151) | value 0.0 | modCount 2 | size 2 |------[1] TermQuery$TermWeight (id=218) idf 2.0986123 idfExp Similarity$1 (id=233) queryNorm 0.0 queryWeight 0.0 similarity DefaultSimilarity (id=177) //contents:foods this$0 TermQuery (id=154) value 0.0 modCount 2 size 2 modCount 3 size 3 |
2.4.1.3、计算Term Weight分数
(1) 首先计算sumOfSquaredWeights
按照公式:
代码如下:
float sum = weight.sumOfSquaredWeights();
//可以看出,也是一个递归的过程
public float sumOfSquaredWeights() throws IOException { float sum = 0.0f; for (int i = 0 ; i < weights.size(); i++) { float s = weights.get(i).sumOfSquaredWeights(); if (!clauses.get(i).isProhibited()) sum += s; } sum *= getBoost() * getBoost(); //乘以query boost return sum ; }
对于叶子节点TermWeight来讲,其TermQuery$TermWeight.sumOfSquaredWeights()实现如下:
public float sumOfSquaredWeights() { //计算一部分打分,idf*t.getBoost(),将来还会用到。 queryWeight = idf * getBoost(); //计算(idf*t.getBoost())^2 return queryWeight * queryWeight; }
对于叶子节点ConstantWeight来讲,其ConstantScoreQuery$ConstantWeight.sumOfSquaredWeights() 如下:
public float sumOfSquaredWeights() { //除了用户指定的boost以外,其他都不计算在打分内 queryWeight = getBoost(); return queryWeight * queryWeight; }
(2) 计算queryNorm
其公式如下:
其代码如下:
public float queryNorm(float sumOfSquaredWeights) { return (float)(1.0 / Math.sqrt(sumOfSquaredWeights)); }
(3) 将queryNorm算入打分
代码为:
weight.normalize(norm);
//又是一个递归的过程 public void normalize(float norm) { norm *= getBoost(); for (Weight w : weights) { w.normalize(norm); } }
其叶子节点TermWeight来讲,其TermQuery$TermWeight.normalize(float) 代码如下:
public void normalize(float queryNorm) { this.queryNorm = queryNorm; //原来queryWeight为idf*t.getBoost(),现在为queryNorm*idf*t.getBoost()。 queryWeight *= queryNorm; //打分到此计算了queryNorm*idf*t.getBoost()*idf = queryNorm*idf^2*t.getBoost()部分。 value = queryWeight * idf; }
我们知道,Lucene的打分公式整体如下,到此计算了图中,红色的部分:
相关文章推荐
- Lucene学习总结之七:Lucene搜索过程解析
- Lucene学习总结之七:Lucene搜索过程解析(2)
- Lucene学习总结之七:Lucene搜索过程解析(8)
- Lucene搜索过程解析
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(1)
- Lucene学习总结之七:Lucene搜索过程解析(4)
- Lucene学习总结之七:Lucene搜索过程解析(2)
- Lucene搜索过程解析(5)
- Lucene学习总结之七:Lucene搜索过程解析
- Lucene学习总结之七:Lucene搜索过程解析(5)
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(6)
- Lucene学习总结之七:Lucene搜索过程解析(2)
- Lucene学习笔记: 五,Lucene搜索过程解析
- Lucene学习总结之七:Lucene搜索过程解析(1)
- Lucene学习总结之七:Lucene搜索过程解析(3)
- Lucene学习总结之七:Lucene搜索过程解析(1)
- Lucene搜索过程解析(7)