您的位置：首页 > 其它

ElasticSearch42：初识搜索引擎_揭秘如何将一个field索引两次来解决字符串排序问题

2018-01-05 16:45 585 查看

1.字符串排序有什么问题？

如果对一个string field进行排序，结果往往不准确，因为分词后是多个单词，再排序就不是我们想要的结果了。

通常解决方案是：将一个string field建立两次索引，一个分词，用来搜索，一个部分次，用来进行排序

例子：

GET /website/article/_search

{

"query": {

    "match_all": {}

},

"sort": [

    {

      "title": "asc"

    }

]

}

报错，因为没有正排索引，这里不讲解，后面进行讲解

{

"error": {

    "root_cause": [

      {

        "type": "illegal_argument_exception",

        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

      }

    ],

    "type": "search_phase_execution_exception",

    "reason": "all shards failed",

    "phase": "query",

    "grouped": true,

    "failed_shards": [

      {

        "shard": 0,

        "index": "website",

        "node": "5JcZFTo8TMGAcBR5psWKmg",

        "reason": {

          "type": "illegal_argument_exception",

          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

        }

      }

    ],

    "caused_by": {

      "type": "illegal_argument_exception",

      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

    }

},

"status": 400

}

我们来说一下如何解决排序的问题：

1）首先删除索引

DELETE /website

2）重建索引

注意，"fielddata": true必须要，需要构建正排索引，否则无法对其进行排序操作

增加一个不进行分词的排序字段：

"fields": {

            "raw":{

              "type":"string",

              "index":"not_analyzed"

            }

完整命令

PUT /website
{
"mappings": {
"article":{
"properties": {
"title":{
"type": "text",
"fields": {
"raw":{
"type":"string",
"index":"not_analyzed"
}
},
"fielddata": true
},
"content":{
"type":"text"
},
"post_date":{
"type":"date"
},
"author_id":{
"type":"long"
}
}
}
}
}

执行结果：

{

"acknowledged": true,

"shards_acknowledged": true

}

准备数据：

PUT /website/article/1
{
"title":"second article",
"content":"this is my second article",
"post_date":"2017-01-01",
"author_id":100
}
PUT /website/article/2
{
"title":"first article",
"content":"this is my first article",
"post_date":"2017-02-01",
"author_id":100
}
PUT /website/article/3
{
"title":"third article",
"content":"this is my third article",
"post_date":"2017-03-01",
"author_id":100
}

查询一下：

{
"took": 120,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": 1,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
}
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
}
},
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": 1,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
}
}
]
}
}

执行普通的排序：

GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title": {
"order": "desc"
}
}
]
}

执行结果：

可以看到每一个hits中的sort，都会显示排序的实际词，默认情况下都是经过字符串分词后取一个词出来进行排序

        "sort": [

          "third"

        ]

执行结果：

{
"took": 1304,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
},
"sort": [
"third"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
},
"sort": [
"second"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
},
"sort": [
"first"
]
}
]
}
}

我们可以指定raw作为排序，自行指定排序raw是title索引出来的一个不进行分词的field

GET /website/article/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title.raw": {
"order": "desc"
}
}
]
}

那么，可以看到分词的是整个title的内容

{
"took": 22,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "website",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"title": "third article",
"content": "this is my third article",
"post_date": "2017-03-01",
"author_id": 100
},
"sort": [
"third article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "1",
"_score": null,
"_source": {
"title": "second article",
"content": "this is my second article",
"post_date": "2017-01-01",
"author_id": 100
},
"sort": [
"second article"
]
},
{
"_index": "website",
"_type": "article",
"_id": "2",
"_score": null,
"_source": {
"title": "first article",
"content": "this is my first article",
"post_date": "2017-02-01",
"author_id": 100
},
"sort": [
"first article"
]
}
]
}
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： elasticsearch

相关文章推荐

新的分享

章节导航