一步一步跟我学习lucene(13)---lucene搜索之自定义排序的实现原理和编写自己的自定义排序工具
2015-05-24 23:02
519 查看
自定义排序说明
我们在做lucene搜索的时候,可能会需要排序功能,虽然lucene内置了多种类型的排序,但是如果在需要先进行某些值的运算然后在排序的时候就有点显得无能为力了;要做自定义查询,我们就要研究lucene已经实现的排序功能,lucene的所有排序都是要继承FieldComparator,然后重写内部实现,这里以IntComparator为例子来查看其实现;
IntComparator相关实现
其类的声明为 public static class IntComparator extends NumericComparator<Integer>,这里说明IntComparator接收的是Integer类型的参数,即只处理IntField的排序;IntComparator声明的参数为:
private final int[] values; private int bottom; // Value of bottom of queue private int topValue;
查看copy方法可知
values随着类初始化而初始化其长度
values用于存储NumericDocValues中读取到的内容
具体实现如下:
values的初始化
/** * Creates a new comparator based on {@link Integer#compare} for {@code numHits}. * When a document has no value for the field, {@code missingValue} is substituted. */ public IntComparator(int numHits, String field, Integer missingValue) { super(field, missingValue); values = new int[numHits]; }
values值填充(此为IntComparator的处理方式)
@Override public void copy(int slot, int doc) { int v2 = (int) currentReaderValues.get(doc); // Test for v2 == 0 to save Bits.get method call for // the common case (doc has value and value is non-zero): if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) { v2 = missingValue; } values[slot] = v2; }
这些实现都是类似的,我们的应用实现自定义排序的时候需要做的是对binaryDocValues或NumericDocValues的值进行计算,然后实现FieldComparator内部方法,对应IntComparator就是如上的值copy操作;
然后我们需要实现compareTop、compareBottom和compare,IntComparator的实现为:
@Override public int compare(int slot1, int slot2) { return Integer.compare(values[slot1], values[slot2]); } @Override public int compareBottom(int doc) { int v2 = (int) currentReaderValues.get(doc); // Test for v2 == 0 to save Bits.get method call for // the common case (doc has value and value is non-zero): if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) { v2 = missingValue; } return Integer.compare(bottom, v2); }
@Override public int compareTop(int doc) { int docValue = (int) currentReaderValues.get(doc); // Test for docValue == 0 to save Bits.get method call for // the common case (doc has value and value is non-zero): if (docsWithField != null && docValue == 0 && !docsWithField.get(doc)) { docValue = missingValue; } return Integer.compare(topValue, docValue); }
实现自己的FieldComparator
要实现FieldComparator,需要对接收参数进行处理,定义处理值的集合,同时定义BinaryDocValues和接收的参数等,这里我写了一个通用的比较器,代码如下:package com.lucene.search; import java.io.IOException; import org.apache.lucene.index.BinaryDocValues; import org.apache.lucene.index.DocValues; import org.apache.lucene.index.LeafReaderContext; import org.apache.lucene.search.SimpleFieldComparator; import com.lucene.util.ObjectUtil; /**自定义comparator * @author lenovo * */ public class SelfDefineComparator extends SimpleFieldComparator<String> { private Object[] values;//定义的Object[],同IntComparator private Object bottom; private Object top; private String field; private BinaryDocValues binaryDocValues;//接收的BinaryDocValues,同IntComparator中的NumericDocValues private ObjectUtil objectUtil;//这里为了便于拓展用接口代替抽象类 private Object[] params;//接收的参数 public SelfDefineComparator(String field, int numHits, Object[] params,ObjectUtil objectUtil) { values = new Object[numHits]; this.objectUtil = objectUtil; this.field = field; this.params = params; } @Override public void setBottom(int slot) { this.bottom = values[slot]; } @Override public int compareBottom(int doc) throws IOException { Object distance = getValues(doc); return (bottom.toString()).compareTo(distance.toString()); } @Override public int compareTop(int doc) throws IOException { Object distance = getValues(doc); return objectUtil.compareTo(top,distance); } @Override public void copy(int slot, int doc) throws IOException { values[slot] = getValues(doc); } /**��ȡdocID��Ӧ��value * @param doc * @return */ private Object getValues(int doc) { Object instance = objectUtil.getValues(doc,params,binaryDocValues) ; return instance; } @Override protected void doSetNextReader(LeafReaderContext context) throws IOException { binaryDocValues = DocValues.getBinary(context.reader(), field);//context.reader().getBinaryDocValues(field); } @Override public int compare(int slot1, int slot2) { return objectUtil.compareTo(values[slot1],values[slot2]); } @Override public void setTopValue(String value) { this.top = value; } @Override public String value(int slot) { return values[slot].toString(); } }
其中ObjectUtil是一个接口,定义了值处理的过程,最终是要服务于comparator的compare方法的,同时对comparator的内部compare方法进行了定义
ObjectUtil接口定义如下:
package com.lucene.util; import org.apache.lucene.index.BinaryDocValues; public interface ObjectUtil { /**自定义的获取处理值的方法 * @param doc * @param params * @param binaryDocValues * @return */ public abstract Object getValues(int doc, Object[] params, BinaryDocValues binaryDocValues) ; /**compare比较器实现 * @param object * @param object2 * @return */ public abstract int compareTo(Object object, Object object2); }
我们不仅要提供比较器和comparator,同时还要提供接收用户输入的FiledComparatorSource
package com.lucene.search; import java.io.IOException; import org.apache.lucene.search.FieldComparator; import org.apache.lucene.search.FieldComparatorSource; import com.lucene.util.ObjectUtil; /**comparator用于接收用户原始输入,继承自FieldComparatorSource实现了自定义comparator的构建 * @author lenovo * */ public class SelfDefineComparatorSource extends FieldComparatorSource { private Object[] params;//接收的参数 private ObjectUtil objectUtil;//这里为了便于拓展用接口代替抽象类 public Object[] getParams() { return params; } public void setParams(Object[] params) { this.params = params; } public ObjectUtil getObjectUtil() { return objectUtil; } public void setObjectUtil(ObjectUtil objectUtil) { this.objectUtil = objectUtil; } public SelfDefineComparatorSource(Object[] params, ObjectUtil objectUtil) { super(); this.params = params; this.objectUtil = objectUtil; } @Override public FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException { //实际比较由SelfDefineComparator实现 return new SelfDefineComparator(fieldname, numHits, params, objectUtil); } }
相关测试程序,这里我们模拟一个StringComparator,对String值进行排序
package com.lucene.search; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.BinaryDocValuesField; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.StringField; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.TopFieldDocs; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.BytesRef; import com.lucene.util.CustomerUtil; import com.lucene.util.ObjectUtil; import com.lucene.util.StringComparaUtil; /** * * @author 吴莹桂 * */ public class SortTest { public static void main(String[] args) throws Exception { RAMDirectory directory = new RAMDirectory(); Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer); indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND); IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig); addDocument(indexWriter, "B"); addDocument(indexWriter, "D"); addDocument(indexWriter, "A"); addDocument(indexWriter, "E"); indexWriter.commit(); indexWriter.close(); IndexReader reader = DirectoryReader.open(directory); IndexSearcher searcher = new IndexSearcher(reader); Query query = new MatchAllDocsQuery(); ObjectUtil util = new StringComparaUtil(); Sort sort = new Sort(new SortField("name",new SelfDefineComparatorSource(new Object[]{},util),true)); TopDocs topDocs = searcher.search(query, Integer.MAX_VALUE, sort); ScoreDoc[] docs = topDocs.scoreDocs; for(ScoreDoc doc : docs){ Document document = searcher.doc(doc.doc); System.out.println(document.get("name")); } } private static void addDocument(IndexWriter writer,String name) throws Exception{ Document document = new Document(); document.add(new StringField("name",name,Field.Store.YES)); document.add(new BinaryDocValuesField("name", new BytesRef(name.getBytes()))); writer.addDocument(document); } }
其对应的ObjectUtil实现如下:
package com.lucene.util; import org.apache.lucene.index.BinaryDocValues; import org.apache.lucene.util.BytesRef; public class StringComparaUtil implements ObjectUtil { @Override public Object getValues(int doc, Object[] params, BinaryDocValues binaryDocValues) { BytesRef bytesRef = binaryDocValues.get(doc); String value = bytesRef.utf8ToString(); return value; } @Override public int compareTo(Object object, Object object2) { // TODO Auto-generated method stub return object.toString().compareTo(object2.toString()); } }
时间不早了,今天先写到这里,源码下载地址:
http://download.csdn.net/detail/wuyinggui10000/8734907
一步一步跟我学习lucene是对近期做lucene索引的总结,大家有问题的话联系本人的Q-Q: 891922381,同时本人新建Q-Q群:106570134(lucene,solr,netty,hadoop),如蒙加入,不胜感激,大家共同探讨,本人争取每日一博,希望大家持续关注,会带给大家惊喜的
相关文章推荐
- 一步一步跟我学习lucene(8)---lucene搜索之索引的查询原理和查询工具类(支持分页)示例
- 一步一步跟我学习lucene(10)---lucene搜索之联想词提示之suggest原理和应用
- 一步一步跟我学习lucene(14)---lucene搜索之facet查询原理和facet查询实例
- lucene学习之创建自定义排序
- 一步一步跟我学习lucene(2)---lucene的各种Field及其排序
- 一步一步跟我学习lucene(5)---lucene的索引构建原理
- 一步一步跟我学习lucene(7)---lucene搜索之IndexSearcher构建过程
- 一步一步跟我学习lucene(9)---lucene搜索之拼写检查和相似度查询提示(spellcheck)
- 一步一步跟我学习lucene(11)---lucene搜索之高亮显示highlighter
- 一步一步跟我学习lucene(12)---lucene搜索之分组处理group查询
- 一步一步跟我学习lucene(15)---java读取word excel pdf及lucene搜索之正则表达式查询RegExQuery和手机邮箱查询示例
- 一步一步跟我学习lucene(16)---lucene搜索之facet查询查询示例(2)
- 一步一步跟我学习lucene(17)---lucene搜索之expressions表达式处理
- Lucene实战(三)多Field搜索,并且对搜索结果进行过滤和[自定义]排序
- Lucene 中自定义排序的实现
- Struts2系统学习(13)输入校验-采用手工编写代码实现校验
- 通过自己编写的stack类实现快速排序的非递归排序
- 深度学习(deep learning)之一步一步实现编写深度神经网络(DNN)
- 【cocos2d-x 2.x 学习与应用总结】13: 借助CCGLProgram实现自定义绘制