您的位置:首页 > 其它

Lucene搜索过程解析(4)

2012-12-13 15:49 239 查看
以下转自:http://forfuture1978.iteye.com/blog/632829

2.4、搜索查询对象

2.4.1.2、创建Weight对象树
BooleanQuery.createWeight(Searcher) 最终返回return new BooleanWeight(searcher),BooleanWeight构造函数的具体实现如下:
public BooleanWeight(Searcher searcher) {
this.similarity = getSimilarity(searcher);
weights = new ArrayList<Weight>(clauses.size());
//也是一个递归的过程,沿着新的Query对象树一直到叶子节点
for (int i = 0 ; i < clauses.size(); i++) {
weights.add(clauses.get(i).getQuery().createWeight(searcher));
}
}

对于TermQuery的叶子节点,其TermQuery.createWeight(Searcher) 返回return new TermWeight(searcher)对象,TermWeight构造函数如下:
public TermWeight(Searcher searcher) {
this.similarity = getSimilarity(searcher);
//此处计算了idf
idfExp = similarity.idfExplain(term, searcher);
idf = idfExp.getIdf();
}

//idf的计算完全符合文档中的公式:



//idf的计算完全符合文档中的公式:

public IDFExplanation idfExplain(final Term term, final Searcher searcher) {
final int df = searcher.docFreq(term);
final int max = searcher.maxDoc();
final float idf = idf(df, max);
return new IDFExplanation() {
public float getIdf() {
return idf;
}};
}
public float idf(int docFreq, int numDocs) {
return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
}

而ConstantScoreQuery.createWeight(Searcher) 除了创建ConstantScoreQuery.ConstantWeight(searcher)对象外,没有计算idf。
由此创建的Weight对象树如下:

weight BooleanQuery$BooleanWeight (id=169)

| similarity DefaultSimilarity (id=177)

| this$0 BooleanQuery (id=89)

| weights ArrayList<E> (id=188)

| elementData Object[3] (id=190)

|------[0] BooleanQuery$BooleanWeight (id=171)

| | similarity DefaultSimilarity (id=177)

| | this$0 BooleanQuery (id=105)

| | weights ArrayList<E> (id=193)

| | elementData Object[2] (id=199)

| |------[0] ConstantScoreQuery$ConstantWeight (id=183)

| | queryNorm 0.0

| | queryWeight 0.0

| | similarity DefaultSimilarity (id=177)
| | //ConstantScore(contents:apple*)

| | this$0 ConstantScoreQuery (id=123)

| |------[1] TermQuery$TermWeight (id=175)

| idf 2.0986123

| idfExp Similarity$1 (id=241)

| queryNorm 0.0

| queryWeight 0.0

| similarity DefaultSimilarity (id=177)
| //contents:boy

| this$0 TermQuery (id=124)

| value 0.0

| modCount 2

| size 2

|------[1] BooleanQuery$BooleanWeight (id=179)

| | similarity DefaultSimilarity (id=177)

| | this$0 BooleanQuery (id=110)

| | weights ArrayList<E> (id=195)

| | elementData Object[2] (id=204)

| |------[0] ConstantScoreQuery$ConstantWeight (id=206)

| | queryNorm 0.0

| | queryWeight 0.0

| | similarity DefaultSimilarity (id=177)
| | //ConstantScore(contents:cat*)

| | this$0 ConstantScoreQuery (id=135)

| |------[1] TermQuery$TermWeight (id=207)

| idf 1.5389965

| idfExp Similarity$1 (id=210)

| queryNorm 0.0

| queryWeight 0.0

| similarity DefaultSimilarity (id=177)
| //contents:dog

| this$0 TermQuery (id=136)

| value 0.0

| modCount 2

| size 2

|------[2] BooleanQuery$BooleanWeight (id=182)

| similarity DefaultSimilarity (id=177)

| this$0 BooleanQuery (id=113)

| weights ArrayList<E> (id=197)

| elementData Object[2] (id=216)

|------[0] BooleanQuery$BooleanWeight (id=181)

| | similarity BooleanQuery$1 (id=220)

| | this$0 BooleanQuery (id=145)

| | weights ArrayList<E> (id=221)

| | elementData Object[2] (id=224)

| |------[0] TermQuery$TermWeight (id=226)

| | idf 2.0986123

| | idfExp Similarity$1 (id=229)

| | queryNorm 0.0

| | queryWeight 0.0

| | similarity DefaultSimilarity (id=177)
| | //contents:eat

| | this$0 TermQuery (id=150)

| | value 0.0

| |------[1] TermQuery$TermWeight (id=227)

| idf 1.1823215

| idfExp Similarity$1 (id=231)

| queryNorm 0.0

| queryWeight 0.0

| similarity DefaultSimilarity (id=177)
| //contents:cat^0.33333325

| this$0 TermQuery (id=151)

| value 0.0

| modCount 2

| size 2

|------[1] TermQuery$TermWeight (id=218)

idf 2.0986123

idfExp Similarity$1 (id=233)

queryNorm 0.0

queryWeight 0.0

similarity DefaultSimilarity (id=177)
//contents:foods

this$0 TermQuery (id=154)

value 0.0

modCount 2

size 2

modCount 3

size 3


2.4.1.3、计算Term Weight分数
(1) 首先计算sumOfSquaredWeights
按照公式:



代码如下:
float sum = weight.sumOfSquaredWeights();
//可以看出,也是一个递归的过程
public float sumOfSquaredWeights() throws IOException {
float sum = 0.0f;
for (int i = 0 ; i < weights.size(); i++) {
float s = weights.get(i).sumOfSquaredWeights();
if (!clauses.get(i).isProhibited())
sum += s;
}
sum *= getBoost() * getBoost();  //乘以query boost
return sum ;
}

对于叶子节点TermWeight来讲,其TermQuery$TermWeight.sumOfSquaredWeights()实现如下:

public float sumOfSquaredWeights() {
//计算一部分打分,idf*t.getBoost(),将来还会用到。
queryWeight = idf * getBoost();
//计算(idf*t.getBoost())^2
return queryWeight * queryWeight;
}

对于叶子节点ConstantWeight来讲,其ConstantScoreQuery$ConstantWeight.sumOfSquaredWeights() 如下:
public float sumOfSquaredWeights() {
//除了用户指定的boost以外,其他都不计算在打分内
queryWeight = getBoost();
return queryWeight * queryWeight;
}

(2) 计算queryNorm
其公式如下:



其代码如下:
public float queryNorm(float sumOfSquaredWeights) {
return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));
}

(3) 将queryNorm算入打分
代码为:
weight.normalize(norm);
//又是一个递归的过程
public void normalize(float norm) {
norm *= getBoost();
for (Weight w : weights) {
w.normalize(norm);
}
}

其叶子节点TermWeight来讲,其TermQuery$TermWeight.normalize(float) 代码如下:
public void normalize(float queryNorm) {
this.queryNorm = queryNorm;
//原来queryWeight为idf*t.getBoost(),现在为queryNorm*idf*t.getBoost()。
queryWeight *= queryNorm;
//打分到此计算了queryNorm*idf*t.getBoost()*idf = queryNorm*idf^2*t.getBoost()部分。
value = queryWeight * idf;
}

我们知道,Lucene的打分公式整体如下,到此计算了图中,红色的部分:

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: