您的位置：首页 > 其它

elasticsearch核心知识--30.分页搜索以及deep paging性能问题深度理解和es中聚合aggregation的分组可能结果不准确的原因

2018-03-26 16:04 1001 查看

如何使用es进行分页搜索的语法 [size，from]
GET /_search?size=10
GET /_search?size=10&from=0
GET /_search?size=10&from=20
GET /test_index/test_type/_search
"hits": {
"total": 9,
"max_score": 1,
我们假设将这9条数据分成3页，每一页是3条数据，来实验一下这个分页搜索的效果

GET /test_index/test_type/_search?from=0&size=3
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_score": 1,
"_source": {
"test_field": "test client 2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "6",
"_score": 1,
"_source": {
"test_field": "tes test"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "4",
"_score": 1,
"_source": {
"test_field": "test4"
}
}
]
}
}
第一页：id=8,6,4
GET /test_index/test_type/_search?from=3&size=3
第二页：id=2,自动生成,7
GET /test_index/test_type/_search?from=6&size=3
第三页：id=1,11,3
##########重要：：什么是deep paging问题？为什么会产生这个问题，它的底层原理是什么？

deep paging性能问题，以及原理深度。
1.必须1页10条，当我们取第1000页时，在es底层也是需要在每个分片【不限主备shard】取出满足条件的 10010条数据，
如果三个shard ，那么总共在coordination中会load出10010*3=30030条数据，进行排序。然后取出对应的1000页的10条数据。尽量避免深度分页，因为会出现消耗大量的IO 内存 cpu，已经会频繁出现gc。
2.想到了es的aggregation的聚合分组，如果原始数据中存在10个组，只需要查询统计出三个组。那么会出现数据不准确的问题. 因为会去每个分片中进行分组统计，每个分片中前TopN=Top3会出现被收集到coordination中，再进行聚合排序。再取出前三个组的统计结果。但是可能在每个分片查询时，最终的TopN的数据，排名在3名以上，那么那部分的数据不会被统计进来。所以对aggregation的分组如果想数据完全正确，要么是单shard的索引。要么时查处全部的分组后，在取前TopN的分组进行统计。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： elasticsearch 分页 deep page

相关文章推荐

新的分享

章节导航