您的位置:首页 > 其它

lucene4.8.0 + IKAnalyzer5.0.1 创建索引与查询demo

2016-11-12 14:37 309 查看
主要代码:

创建索引:

public void createIndex(){

try {
// 有文件系统或者内存存储方式,这里使用文件系统存储索引数据
Directory directory = new SimpleFSDirectory(new File("C:\\myindex"));
//实例化IKAnalyzer分词器
Analyzer analyzer = new IKAnalyzer(false);
//配置IndexWriterConfig
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_48 , analyzer);
indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter indexWriter = new IndexWriter(directory , indexWriterConfig);
//删除全部索引
indexWriter.deleteAll();

//写入索引
Document doc = new Document();
doc.add(new StringField("id", "1", Store.YES));
doc.add(new TextField("title", "IKAnalyzer的介绍", Store.YES));
doc.add(new TextField("content", "IK Analyzer是一个结合词典分词和文法分词的中文分词开源工具包。它使用了全新的正向迭代最细粒度切分算法。", Store.YES));

// 向IndexWriter中增加新的一行记录
indexWriter.addDocument(doc);
// 提交数据内容
indexWriter.commit();

indexWriter.close();
directory.close();
} catch (Exception e) {
e.printStackTrace();
}
}


查询+高亮:

public void search(){
try {
// 有文件系统或者内存存储方式,这里使用文件系统存储索引数据
Directory directory = new SimpleFSDirectory(new File("C:\\myindex"));
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);

Query query = new TermQuery(new Term("content","算法"));

String preTag = "<font color='red'>";
String postTag = "</font>";
Formatter formatter = new SimpleHTMLFormatter(preTag, postTag);

Scorer fragmentScorer = new QueryScorer(query);
Highlighter highlighter = new Highlighter(formatter, fragmentScorer);
// 这个一般等于你要返回的,高亮的数据长度
highlighter.setTextFragmenter(new SimpleFragmenter(Integer.MAX_VALUE));

TopDocs topDocs = searcher.search(query, 10);
System.out.println("一共查到:" + topDocs.totalHits + "条记录");

//实例化IKAnalyzer分词器
Analyzer analyzer = new IKAnalyzer(false);
ScoreDoc[] scoreDoc = topDocs.scoreDocs;
for (int i = 0; i < scoreDoc.length; i++) {
// 内部编号
int docId = scoreDoc[i].doc;
System.out.println("内部编号:" + docId);
// 根据文档id找到文档
Document doc = searcher.doc(docId);

//String id = highlighter.getBestFragment(analyzer, "id", doc.get("id"));
//String title = highlighter.getBestFragment(analyzer, "title", doc.get("title"));
String content = highlighter.getBestFragment(analyzer, "content", doc.get("content"));

//System.out.println("id:" + id + " title:" + title);
System.out.println("content:" + content);
}

directory.close();
} catch (Exception e) {
e.printStackTrace();
}
}


查询结果:

IK Analyzer是一个结合词典分词和文法分词的中文分词开源工具包。它使用了全新的正向迭代最细粒度切分<font color='red'>算法</font>。


索引可以用luke来查看:

打开cmd,进入luke所在目录,输入命令 java -jar lukeall-4.10.2.jar即可执行。







pom.xml中:

<!--Lucene -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>${lucene}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-highlighter</artifactId>
<version>${lucene}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-memory</artifactId>
<version>${lucene}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queries</artifactId>
<version>${lucene}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>${lucene}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>${lucene}</version>
</dependency>


IKAnalyzer.cfg.xml(在src/main/resources下):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典
<entry key="ext_dict">/mydict.dic;</entry>
-->
<entry key="ext_dict">mydict.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典
<entry key="ext_stopwords">ext_stopword.dic</entry>-->

</properties>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: