ElasticSearch42:初识搜索引擎_揭秘如何将一个field索引两次来解决字符串排序问题
2018-01-05 16:45
585 查看
1.字符串排序有什么问题?
如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。
通常解决方案是:将一个string field建立两次索引,一个分词,用来搜索,一个部分次,用来进行排序
例子:
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": "asc"
}
]
}
报错,因为没有正排索引,这里不讲解,后面进行讲解
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "website",
"node": "5JcZFTo8TMGAcBR5psWKmg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}
我们来说一下如何解决排序的问题:
1)首先删除索引
DELETE /website
2)重建索引
注意,"fielddata": true必须要,需要构建正排索引,否则无法对其进行排序操作
增加一个不进行分词的排序字段:
"fields": {
"raw":{
"type":"string",
"index":"not_analyzed"
}
完整命令
执行结果:
{
"acknowledged": true,
"shards_acknowledged": true
}
准备数据:
查询一下:
执行普通的排序:
执行结果:
可以看到每一个hits中的sort,都会显示排序的实际词,默认情况下都是经过字符串分词后取一个词出来进行排序
"sort": [
"third"
]
执行结果:
我们可以指定raw作为排序,自行指定排序raw是title索引出来的一个不进行分词的field
那么,可以看到分词的是整个title的内容
如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。
通常解决方案是:将一个string field建立两次索引,一个分词,用来搜索,一个部分次,用来进行排序
例子:
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": "asc"
}
]
}
报错,因为没有正排索引,这里不讲解,后面进行讲解
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "website",
"node": "5JcZFTo8TMGAcBR5psWKmg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}
我们来说一下如何解决排序的问题:
1)首先删除索引
DELETE /website
2)重建索引
注意,"fielddata": true必须要,需要构建正排索引,否则无法对其进行排序操作
增加一个不进行分词的排序字段:
"fields": {
"raw":{
"type":"string",
"index":"not_analyzed"
}
完整命令
PUT /website { "mappings": { "article":{ "properties": { "title":{ "type": "text", "fields": { "raw":{ "type":"string", "index":"not_analyzed" } }, "fielddata": true }, "content":{ "type":"text" }, "post_date":{ "type":"date" }, "author_id":{ "type":"long" } } } } }
执行结果:
{
"acknowledged": true,
"shards_acknowledged": true
}
准备数据:
PUT /website/article/1 { "title":"second article", "content":"this is my second article", "post_date":"2017-01-01", "author_id":100 } PUT /website/article/2 { "title":"first article", "content":"this is my first article", "post_date":"2017-02-01", "author_id":100 } PUT /website/article/3 { "title":"third article", "content":"this is my third article", "post_date":"2017-03-01", "author_id":100 }
查询一下:
{ "took": 120, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "website", "_type": "article", "_id": "2", "_score": 1, "_source": { "title": "first article", "content": "this is my first article", "post_date": "2017-02-01", "author_id": 100 } }, { "_index": "website", "_type": "article", "_id": "1", "_score": 1, "_source": { "title": "second article", "content": "this is my second article", "post_date": "2017-01-01", "author_id": 100 } }, { "_index": "website", "_type": "article", "_id": "3", "_score": 1, "_source": { "title": "third article", "content": "this is my third article", "post_date": "2017-03-01", "author_id": 100 } } ] } }
执行普通的排序:
GET /website/article/_search { "query": { "match_all": {} }, "sort": [ { "title": { "order": "desc" } } ] }
执行结果:
可以看到每一个hits中的sort,都会显示排序的实际词,默认情况下都是经过字符串分词后取一个词出来进行排序
"sort": [
"third"
]
执行结果:
{ "took": 1304, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": null, "hits": [ { "_index": "website", "_type": "article", "_id": "3", "_score": null, "_source": { "title": "third article", "content": "this is my third article", "post_date": "2017-03-01", "author_id": 100 }, "sort": [ "third" ] }, { "_index": "website", "_type": "article", "_id": "1", "_score": null, "_source": { "title": "second article", "content": "this is my second article", "post_date": "2017-01-01", "author_id": 100 }, "sort": [ "second" ] }, { "_index": "website", "_type": "article", "_id": "2", "_score": null, "_source": { "title": "first article", "content": "this is my first article", "post_date": "2017-02-01", "author_id": 100 }, "sort": [ "first" ] } ] } }
我们可以指定raw作为排序,自行指定排序raw是title索引出来的一个不进行分词的field
GET /website/article/_search { "query": { "match_all": {} }, "sort": [ { "title.raw": { "order": "desc" } } ] }
那么,可以看到分词的是整个title的内容
{ "took": 22, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": null, "hits": [ { "_index": "website", "_type": "article", "_id": "3", "_score": null, "_source": { "title": "third article", "content": "this is my third article", "post_date": "2017-03-01", "author_id": 100 }, "sort": [ "third article" ] }, { "_index": "website", "_type": "article", "_id": "1", "_score": null, "_source": { "title": "second article", "content": "this is my second article", "post_date": "2017-01-01", "author_id": 100 }, "sort": [ "second article" ] }, { "_index": "website", "_type": "article", "_id": "2", "_score": null, "_source": { "title": "first article", "content": "this is my first article", "post_date": "2017-02-01", "author_id": 100 }, "sort": [ "first article" ] } ] } }
相关文章推荐
- elasticsearch核心知识--41.如何将一个string field索引两次来解决字符串排序问题
- ElasticSearch47:初识搜索引擎_搜索相关参数梳理以及bouncing results问题解决方法
- 如何解决ADO.NET访问Access数据库出现"操作必须使用一个可更新的查询"的问题
- 终于解决了一个问题--如何在数据绑定时不让combox控件触发SelectedIndexChanged事件
- ASP.Net下如何解决关于Access数据库“操作必须使用一个可更新的查询”问题
- 一个mysql表索引被破坏的问题及解决
- 如何解决ADO.NET访问Access数据库出现"操作必须使用一个可更新的查询"的问题(非原创,摘自书中)
- JQuery ajax 如何设置同步调用(同时只能触发一个函数) 解决与层显示信息时候的冲突问题
- 如何把一个字符串填充到一个无类型的指针 - 回复 "豪杰的爸爸" 的问题
- 近期的一些学习--当一个程序开发到了中期,突然出现性能问题,如何解决
- 如何在一个已排序的NSArray中搜索某一特定字符串?答案是使用CFArray自带的搜索功能
- 如何在一个已排序的NSArray中搜索某一特定字符串?答案是使用CFArray自带的搜索功能
- 如何解决一个小问题:当前不会命中断点
- 如何解决SQL Server对上亿表的排序和join连接问题?
- union all 和union 的区别,如何让union 排序问题解决
- 如何解决 html 中多空格字符被当作一个空格字符处理的问题
- 触发器(当2个表中的相应值改变时同时改变一个表中的一个字段)(同时有处理“无法解决 equal to 操作的排序规则冲突”问题)
- 向同一个servlet发多个不同请求,如何解决同步问题?
- 如何解决大量字符串的拼接操作的耗时问题
- 终于解决了一个问题--如何在数据绑定时不让combox控件触发SelectedIndexChanged事件