Elasticsearch下安装IK中文分词器
2017-09-24 00:00
232 查看
环境:elasticsearch版本是5.5.2,其所在目录为/usr/local/elasticsearch-5.5.2
下载
curl -L -O https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.2/elasticsearch-analysis-ik-5.5.2.zip
解压到 /usr/local/elasticsearch-5.5.2/plugins/ , 目录结构如下
重启 elasticsearch
测试
分别用下面两种方式检查一下分词效果
ik_max_word分词法
结果
智能分词法
结果
修改 Mapping中text类型的字段定义
已有大数据需要重建索引
参考 https://github.com/medcl/elasticsearch-analysis-ik
下载
curl -L -O https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.2/elasticsearch-analysis-ik-5.5.2.zip
解压到 /usr/local/elasticsearch-5.5.2/plugins/ , 目录结构如下
├── plugins │ └── elasticsearch-analysis-ik │ ├── commons-codec-1.9.jar │ ├── commons-logging-1.2.jar │ ├── config │ │ ├── extra_main.dic │ │ ├── extra_single_word.dic │ │ ├── extra_single_word_full.dic │ │ ├── extra_single_word_low_freq.dic │ │ ├── extra_stopword.dic │ │ ├── IKAnalyzer.cfg.xml │ │ ├── main.dic │ │ ├── preposition.dic │ │ ├── quantifier.dic │ │ ├── stopword.dic │ │ ├── suffix.dic │ │ └── surname.dic │ ├── elasticsearch-analysis-ik-5.5.2.jar │ ├── httpclient-4.5.2.jar │ ├── httpcore-4.4.4.jar │ └── plugin-descriptor.properties
重启 elasticsearch
测试
分别用下面两种方式检查一下分词效果
ik_max_word分词法
GET _analyze { "analyzer":"ik_max_word", "text":"中华人民共和国国歌" }
结果
{ "tokens": [ { "token": "中华人民共和国", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "中华人民", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 1 }, { "token": "中华", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 2 }, { "token": "华人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 3 }, { "token": "人民共和国", "start_offset": 2, "end_offset": 7, "type": "CN_WORD", "position": 4 }, { "token": "人民", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 5 }, { "token": "共和国", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 6 }, { "token": "共和", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 7 }, { "token": "国", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 8 }, { "token": "国歌", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 9 } ] }
智能分词法
GET _analyze { "analyzer":"ik_smart", "text":"中华人民共和国国歌" }
结果
{ "tokens": [ { "token": "中华人民共和国", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "国歌", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 1 } ] }
修改 Mapping中text类型的字段定义
... "title": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word", "include_in_all": "true" }, ...
已有大数据需要重建索引
参考 https://github.com/medcl/elasticsearch-analysis-ik
相关文章推荐
- elasticsearch-2.1.1 安装中文分词器 elasticsearch-analysis-ik
- elasticsearch插件安装之--中文分词器 ik 安装
- Elasticsearch的ik中文分词器的安装
- ElasticSearch学习 - (八)安装中文分词器IK和拼音分词器
- ElasticSearch 安装中文分词器
- Elasticsearch 中文分词器 IK 配置和使用
- elasticsearch安装与使用(3)-- 安装中文分词插件elasticsearch-analyzer-ik
- elasticsearch安装中文分词(ik)与添加自定义词库
- 【Elasticsearch】安装使用ik中文分词
- ElasticSearch速学 - IK中文分词器 、elasticdump数据导出导入、字段分词
- ElasticSearch速学 - IK中文分词器远程字典设置
- ElasticSearch2.2.1之IK分词器的安装
- Elasticsearch如何安装中文分词插件ik
- Solr --- 安装IK中文分词器
- Elasticsearch2.1.0安装中文分词插件ik1.6
- ElasticSearch学习教程(三)-中文分词器IK
- Elasticsearch之中文分词器插件es-ik的自定义词库
- Elasticsearch如何安装中文分词插件ik
- Elasticsearch安装中文分词插件ik
- es5.0 安装ik中文分词器 mac