您的位置：首页 > 其它
1、学习Lucene3.5之创建索引

2018-02-06 20:34 435 查看
创建索引：
/**
* 建立索引
*/
public void index(){

IndexWriter indexWriter = null;
try {
//1、创建Directory(索引文件的物理位置，放在内存中还是硬盘上)
//        Directory directory = new RAMDirectory();//第一种方式：创建索引到内存中
//FSDirectory.open()：根据当前环境，选择最好的打开方式
Directory directory = FSDirectory.open(new File("e:/lucene/index01"));//第二种方式：创建索引到硬盘中
//2、创建IndexWriter(写索引工具)
/**
* IndexWriterConfig(Version matchVersion, Analyzer analyzer)方法解析：
* 参数一：适配的版本
* 参数二：分词器（StandardAnalyzer：标准分词器）
*/
IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35));
indexWriter = new IndexWriter(directory, conf);
//3、创建Document对象 (写索引)
Document document = null;
//4、为Document添加Field（是Document子元素）
//为文档创建索引
File file = new File("e:/lucene");//找到文档的位置，创建File对象
File[] files = file.listFiles();
if (files!=null && files.length>0){
for (File tempFile : files){
document = new Document();
/**
* 为什么要加这一步判断？
* 原因：
*     因为“FSDirectory.open(new File("e:/lucene/index01"))”这一步的时候会直接在“e:/lucene”创建
* "index01"文件夹，在我们遍历“lucene”文件夹的时候，就会遍历到这个文件夹，而我们的目的是为文档建立
* 索引，而不包括"index01"文件夹，并且遍历到目录的时候也会报错。
*/
if (tempFile.isFile()){
/**
* Field(String name, Reader reader)方法解析：
* 参数一：key
* 参数二：输入流
*/
document.add(new Field("content", new FileReader(tempFile)));//将文档内容添加到索引域中

/**
* Field(String name, String value, Field.Store store, Field.Index index)方法解析：
* 参数一：key
* 参数二：value
* 参数三：是否存储到硬盘（存储域选项）
*     1.Field.Store.YES：表示把这个域中的内容完全存储到文件中，方便进行文本的还原
*     2.Field.Store.NO：表示把这个域中的内容不存储到文件中，但是可以被索引。此时内容无法进行完全
*       还原（无法通过document.get()获取）
*     一般来说，我们是对id进行索引，然后通过id进数据库中进行搜索，找到对应的url，来获取文章内容，
* 所以一般对id索引并存储，文章的内容进行索引，但不存储
* 参数四：索引类型（索引域选项）
*     1.Field.Index.ANALYZED：进行分词和索引，适用于标题、内容等
*     2.Field.Index.NOT_ANALYZED：进行索引，但不进行分词，如果身份证号、姓名、ID等，适用于精确搜索
*     3.Field.Index.ANALYZED_NOT_NORMS：进行分词但是不存储norms信息，这个norms中包括了创建索引的时间和权值等信息
*     4.Field.Index.NOT_ANALYZED_NOT_NORMS：既不进行分词也不存储norms信息
*     5.Field.Index.NO：不进行索引
*
* 存储域和索引域最佳实践：
*          索引域                 存储域          案例
* NOT_ANALYZED_NOT_NORMS           YES         标识符（主键、文件名），电话号码，身份证号，姓名，日期
*         ANALYZED                 YES         文档标题和摘要
*         ANALYZED                 NO          文档正文
*            NO                    YES         文档类型，数据库主键（不进行分词）
*        NOT_ANAL
4000
YZED              NO          隐藏关键字
*/
document.add(new Field("fileName", tempFile.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));//将文件名添加到索引域中
//将文件路径添加到索引域中
document.add(new Field("filePath", tempFile.getAbsolutePath(), Field.Store.YES, Field.Index.NOT_ANALYZED));
//5、通过IndexWriter添加Document到索引中
indexWriter.addDocument(document);
}
}
}
} catch (IOException e) {
e.printStackTrace();
}finally {
if (indexWriter != null){
try {
indexWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理
标签：
相关文章推荐
新的分享
章节导航