您的位置：首页 > 其它

Lucene深入学习（7）Lucene的索引过程

2017-11-29 22:52 375 查看

摘要： 索引是Lucene最重要的过程，通过IndexWriter的addDocument()方法可以加入各种Document。本节将以addDocument为入口，探索Lucene的索引过程。本次代码示例基于Lucene 6.2.1.

索引调用方法

IndexWriter的 addDocument

public long addDocument(Iterable<? extends IndexableField> doc) {
return updateDocument(null, doc);
}

该方法并没有实际的逻辑，需要注意的是它返回的是一个sequence number。

IndexWriter的 updateDocument

public long updateDocument(Term term, Iterable<? extends IndexableField> doc){
long seqNo = docWriter.updateDocument(doc, analyzer, term);
}

该方法在更新操作时，先删除包含term的doc再添加新的doc。这个操作是原子性的，也就是同一个reader在相同的索引上执行。

这里的doc是传入的document，analyzer是在IndexWriterConfig中设置的analyzer，也可以不设置，默认是StandardAnalyzer。

DocumentsWriter的 updateDocument

long updateDocument(final Iterable<? extends IndexableField> doc, final Analyzer analyzer, final Term delTerm)){
final DocumentsWriterPerThread dwpt = perThread.dwpt;
seqNo = dwpt.updateDocument(docs, analyzer, delTerm);
}

该方法实现了对锁的处理，正真添加的文档的方法继续调用。

DocumentsWriterPerThread的 updateDocument

public long updateDocument(Iterable<? extends IndexableField> doc, Analyzer analyzer, Term delTerm){
docState.doc = doc;
docState.analyzer = analyzer;
consumer.processDocument();
}

这里使用了静态内部类DocState传值，处理Doc的事情交给了DocConsumer。

DocConsumer的 processDocument

public void processDocument(){

int fieldCount = 0;
long fieldGen = nextFieldGen++;
for (IndexableField field : docState.doc) {
fieldCount = processField(field, fieldGen, fieldCount);
}
}

DocConsumer是一个接口，默认使用到了它的实现类DefaultIndexingChain。这里的fieldCount表示需要索引的field的个数，fieldGen表示该方法的调用次数（每调用一次，+1）。

DefaultIndexingChain的 processField

private int processField(IndexableField field, long fieldGen, int fieldCount){
String fieldName = field.name();
IndexableFieldType fieldType = field.fieldType();
PerField fp = null;
if (fieldType.indexOptions() == null) {
throw new NullPointerException("IndexOptions must not be null (field: \"" + field.name() + "\")");
}
// Invert indexed fields:
if (fieldType.indexOptions() != IndexOptions.NONE) {
// if the field omits norms, the boost cannot be indexed.
if (fieldType.omitNorms() && field.boost() != 1.0f) {
throw new UnsupportedOperationException("You cannot set an index-time boost: norms are omitted for field '" + field.name() + "'");
}
fp = getOrAddField(fieldName, fieldType, true);
boolean first = fp.fieldGen != fieldGen;
fp.invert(field, first);
if (first) {
fields[fieldCount++] = fp;
fp.fieldGen = fieldGen;
}
} else {
verifyUnIndexedFieldType(fieldName, fieldType);
}
// Add stored fields:
if (fieldType.stored()) {
if (fp == null) {
fp = getOrAddField(fieldName, fieldType, false);
}
if (fieldType.stored()) {
try {
storedFieldsWriter.writeField(fp.fieldInfo, field);
} catch (Throwable th) {
throw AbortingException.wrap(th);
}
}
}
DocValuesType dvType = fieldType.docValuesType();
if (dvType == null) {
throw new NullPointerException("docValuesType must not be null (field: \"" + fieldName + "\")");
}
if (dvType != DocValuesType.NONE) {
if (fp == null) {
fp = getOrAddField(fieldName, fieldType, false);
}
indexDocValue(fp, dvType, field);
}
if (fieldType.pointDimensionCount() != 0) {
if (fp == null) {
fp = getOrAddField(fieldName, fieldType, false);
}
indexPoint(fp, field);
}
return fieldCount;
}

这里的IndexableField代表索引时一个的filed。在IndexWriter中，你可以认为它就是一个document的内部表示形式。IndexableField是一个接口，它含有几个重要的属性：field-name, field-type, filed-value。

processField的代码不长，包含了索引的核心逻辑，因此我没有删减代码。可以看到几个关键参数fieldGen和fieldCount是如何操作的。

最终的写操作调用了writeField。

StoredFieldsWriter的 writeField

public void writeField(FieldInfo info, IndexableField field){
if(long)    bufferedDocs.writeVLong(infoAndBits);
if(int)     bufferedDocs.writeVInt(bytes.length);
if(String)  bufferedDocs.writeString(string);
....
}

这里的写操作主要是判断filed的类型，然后交给具体的实现逻辑GrowableByteArrayDataOutput

DataOutput的 writeXXX

public void writeByte(byte b) {
if (length >= bytes.length) {
bytes = ArrayUtil.grow(bytes);
}
bytes[length++] = b;
}

这里列出的是最简单的writeByte()方法，其他方法都由该方法扩展而来。

public void writeLong(long i) throws IOException {
writeInt((int) (i >> 32));
writeInt((int) i);
}

到这里，整个的索引过程就结束了。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： lucene 索引过程

相关文章推荐

新的分享

章节导航