您的位置:首页 > 其它

ElasticSearch42:初识搜索引擎_揭秘如何将一个field索引两次来解决字符串排序问题

2018-01-05 16:45 585 查看
1.字符串排序有什么问题?

如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。

通常解决方案是:将一个string field建立两次索引,一个分词,用来搜索,一个部分次,用来进行排序

例子:

GET /website/article/_search

{

  "query": {

    "match_all": {}

  },

  "sort": [

    {

      "title": "asc"

    }

  ]

}

报错,因为没有正排索引,这里不讲解,后面进行讲解

{

  "error": {

    "root_cause": [

      {

        "type": "illegal_argument_exception",

        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

      }

    ],

    "type": "search_phase_execution_exception",

    "reason": "all shards failed",

    "phase": "query",

    "grouped": true,

    "failed_shards": [

      {

        "shard": 0,

        "index": "website",

        "node": "5JcZFTo8TMGAcBR5psWKmg",

        "reason": {

          "type": "illegal_argument_exception",

          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

        }

      }

    ],

    "caused_by": {

      "type": "illegal_argument_exception",

      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

    }

  },

  "status": 400

}

我们来说一下如何解决排序的问题:

1)首先删除索引

DELETE /website

2)重建索引

注意,"fielddata": true必须要,需要构建正排索引,否则无法对其进行排序操作

增加一个不进行分词的排序字段:

 "fields": {

            "raw":{

              "type":"string",

              "index":"not_analyzed"

            }

完整命令
PUT /website
{
"mappings": {
"article":{
"properties": {
"title":{
"type": "text",
"fields": {
"raw":{
"type":"string",
"index":"not_analyzed"
}
},
"fielddata": true
},
"content":{
"type":"text"
},
"post_date":{
"type":"date"
},
"author_id":{
"type":"long"
}
}
}
}
}


执行结果:

{

  "acknowledged": true,

  "shards_acknowledged": true

}

准备数据:
PUT /website/article/1
{
"title":"second article",
"content":"this is my second article",
"post_date":"2017-01-01",
"author_id":100
}
PUT /website/article/2
{
"title":"first article",
"content":"this is my first article",
"post_date":"2017-02-01",
"author_id":100
}
PUT /website/article/3
{
"title":"third article",
"content":"this is my third article",
"post_date":"2017-03-01",
"author_id":100
}


查询一下:
{
"took": 120,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": 1,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
}
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
}
},
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": 1,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
}
}
]
}
}


执行普通的排序:
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": {
"order": "desc"
}
}
]
}


执行结果:

可以看到每一个hits中的sort,都会显示排序的实际词,默认情况下都是经过字符串分词后取一个词出来进行排序

        "sort": [

          "third"

        ]

执行结果:        
{
"took": 1304,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
},
"sort": [
"third"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
},
"sort": [
"second"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
},
"sort": [
"first"
]
}
]
}
}


我们可以指定raw作为排序,自行指定排序raw是title索引出来的一个不进行分词的field
GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title.raw": {
"order": "desc"
}
}
]
}


那么,可以看到分词的是整个title的内容
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
},
"sort": [
"third article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
},
"sort": [
"second article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
},
"sort": [
"first article"
]
}
]
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  elasticsearch
相关文章推荐