Mysql随机取样——ORDER BY RAND()优化
2016-12-21 11:49
323 查看
The palest ink is better than best memory——好记性不如烂笔头。2013补记
执行过程如下:
可是,ORDER BY RAND()超级慢,在200W记录表中,单次执行需要17s+….如果这样上线,用户并发访问,数据库肯定吃不消。
查阅MySQL官方手册得知:在ORDER BY从句里面不能使用RAND()函数,因为这样会导致数据列被多次扫描。
如下:http://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand
RAND() in a WHERE clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join). Thus, for optimizer purposes, RAND() is not a constant value and cannot be used for index optimizations. For more information, see Section 9.2.1.3.5, “Function Call Optimization”.
Use of a column with RAND() values in an ORDER BY or GROUP BY clause may yield unexpected results because for either clause a RAND() expression can be evaluated multiple times for the same row, each time returning a different result.
搜索Google,网上基本上都是使用MAX() /MIN()、 RAND()、FLOOR()/ROUND()函数相互结合达到效果,如下SQL1::
或简单一点如下SQL2:
速度是挺快,但是结果集不够随机,往往数据在一个区间——不离散,所以不符合需求。
把SQL2改造如下SQL3(两者的区别就在于JOIN和FROM子查询的不同):
查询一次,基本耗时在0.09s,执行过程如下:
把SQL3继续改成如下,结果较离散一些(执行效率差不多):
一、关键词:
随机取样、order by rand()二、业务场景:
一款新产品上线后,刚开始用户比较少,不够活跃。但,出于运营需要,比如社交产品首页Feed流随机出老动态,系统修改发布时间显示~电商产品的商品列表随机显示商品,不至于每次用户看见的商品都一样——空城计——一种活跃的假象。所以研发就有了随机取样的活儿……三、事例:
使用mysql的都知道,如下这样是随机取样的最直接方便的方式:SELECT * FROM product_info ORDER BY RAND() LIMIT 10
执行过程如下:
可是,ORDER BY RAND()超级慢,在200W记录表中,单次执行需要17s+….如果这样上线,用户并发访问,数据库肯定吃不消。
查阅MySQL官方手册得知:在ORDER BY从句里面不能使用RAND()函数,因为这样会导致数据列被多次扫描。
如下:http://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand
RAND() in a WHERE clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join). Thus, for optimizer purposes, RAND() is not a constant value and cannot be used for index optimizations. For more information, see Section 9.2.1.3.5, “Function Call Optimization”.
Use of a column with RAND() values in an ORDER BY or GROUP BY clause may yield unexpected results because for either clause a RAND() expression can be evaluated multiple times for the same row, each time returning a different result.
搜索Google,网上基本上都是使用MAX() /MIN()、 RAND()、FLOOR()/ROUND()函数相互结合达到效果,如下SQL1::
SELECT * FROM `product_info` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `product_info`)-(SELECT MIN(id) FROM `product_info`))+(SELECT MIN(id) FROM `product_info`)) AS id) AS t2 WHERE t1.id >= t2.id ORDER BY t1.id LIMIT 10;
或简单一点如下SQL2:
SELECT * FROM `product_info` p join (SELECT FLOOR(RAND() * (SELECT MAX(id) FROM `product_info`)) AS id ) as p1 WHERE p.id >= p1.id ORDER BY p.id LIMIT 10;
速度是挺快,但是结果集不够随机,往往数据在一个区间——不离散,所以不符合需求。
把SQL2改造如下SQL3(两者的区别就在于JOIN和FROM子查询的不同):
SELECT * FROM `product_info` WHERE id >= (SELECT FLOOR(RAND() * (SELECT MAX(id) FROM `product_info`))) ORDER BY id LIMIT 10;
查询一次,基本耗时在0.09s,执行过程如下:
把SQL3继续改成如下,结果较离散一些(执行效率差不多):
SELECT * FROM `product_info` WHERE id >= (SELECT FLOOR( RAND()*((SELECT MAX(id) FROM product_info)-(SELECT MIN(id) FROM product_info))+(SELECT MIN(id) FROM product_info))) ORDER BY id LIMIT 10;
相关文章推荐
- mysql 随机获取记录 order by rand 优化
- [置顶] 【MySQL性能优化】改进MySQL Order By Rand()的低效率
- mysql:21个性能优化最佳实践之6[不要使用ORDER BY RAND()]
- 【MySQL性能优化】改进MySQL Order By Rand()的低效率
- 【MySQL性能优化】改进MySQL Order By Rand()的低效率
- sql优化-order by rand
- mysql的ORDER BY RAND()优化
- 【MySQL性能优化】改进MySQL Order By Rand()的低效率
- mysql order by rand() 优化方法
- mysql order by rand() 效率优化方法
- MySQL order by性能优化方法实例
- mysql order by优化
- MySQL Order By实现原理分析和Filesort优化
- MySQL order by 分页查询优化
- MySQL Order By Rand()效率
- MySQL Order By Rand()效率分析
- mysql order by 优化 |order by 索引的应用
- MySQL Order By实现原理分析和Filesort优化
- MySQL order by的一个优化思路
- MySQL Order by 语句用法与优化详解