您的位置:首页 > 其它

ElasticSearch48:初识搜索引擎_上机动手实战基于scroll技术滚动搜索大量数据

2018-01-08 09:19 816 查看
1.为什么使用srcoll滚动搜索

问题:如果一次性查询出100000条数据,那么性能会很差,此时一般会采用scroll滚动查询,一批一批的查,知道所有数据都查询完。

使用scroll滚动搜索,可以先搜索一批数据,然后下次再搜索一批数据,以此类推,知道搜索出全部的数据。

scroll搜索会在第一次搜索的时候,保存一个当时的视图快照,之后只会基于该旧的视图快照提供数据搜索,如果这个期间数据变更,是不会让用户看到的

采用基于_doc进行排序的方式,性能较高

每次发送scroll请求,我们还需要指定一个scroll参数,指定一个时间窗口,每次搜索请求只要在这个时间窗口内能完成就可以

2.scroll技术,看起来和分页很像,但是其实使用场景不一样,分页主要是一页一页搜索,展示给用户看,而scroll是主要是用来一批一批检索数据,让系统进行处理的。

3.例子:
第一次查询:查询出3条数据,scroll指定为1分钟

GET /test_index/test_type/_search?scroll=1m
{
"query": {
"match_all": {}
},
"sort": [
{
"_doc": {
"order": "desc"
}
}
],
"size":3
}


执行结果:
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n",
"took": 40,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_score": null,
"_source": {
"test_field": "jiaxing"
},
"sort": [
0
]
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": null,
"_source": {
"test_field": "world xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"sort": [
0
]
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "11",
"_score": null,
"_source": {
"test_field": "shaoxing"
},
"sort": [
0
]
}
]
}
}

第二次查询:在第一次查询的结果中,返回了执行的_scroll_id,复制下来,在第二次查询中带上

这个_scroll_id 是快照的id,这里面记录了上一次查询的信息,如查询排序,size等
GET /_search/scroll { "scroll":"1m", "scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n" }
执行结果:

{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n",
"took": 15,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "9",
"_score": null,
"_source": {
"test_field": "zhoushan"
},
"sort": [
1
]
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "4",
"_score": null,
"_source": {
"test_field": "hello hello"
},
"sort": [
1
]
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_score": null,
"_source": {
"test_field": "huzhou"
},
"sort": [
1
]
}
]
}
}
第三次搜索:重复第二次,下面的N次搜索只需要重复第二次搜索的操作即可

GET /_search/scroll { "scroll":"1m", "scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n" }

执行结果:
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlR
be00
vOFRNR0FjQlI1cHNXS21n",
"took": 4,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "3",
"_score": null,
"_source": {
"test_field": "test world"
},
"sort": [
1
]
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "10",
"_score": null,
"_source": {
"test_field": "lishui"
},
"sort": [
2
]
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": null,
"_source": {
"test_field": "test hello001"
},
"sort": [
2
]
}
]
}
}


第4次查询:查询到了剩下的最后两条数据,查询完毕
GET /_search/scroll
{
"scroll":"1m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n"
}


执行结果:
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n",
"took": 10,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "12",
"_score": null,
"_source": {
"test_field": "jinhua"
},
"sort": [
3
]
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "5",
"_score": null,
"_source": {
"test_field": "test 001"
},
"sort": [
4
]
}
]
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  elasticsearch
相关文章推荐