您的位置:首页 > 数据库 > MySQL

MySQL千万级别表数据中提高RAND随机查询的实验

2015-09-30 23:01 603 查看

1,准备测试数据

 

1.1 建库建表

mysql> create database hwdb;
Query OK, 1 row affected (0.34 sec)
 
mysql>
mysql> show create table t_huawei;
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                       |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t_huawei | CREATE TABLE `t_huawei` (
 `hwid` int(11) DEFAULT NULL,
 `content` varchar(4000) DEFAULT NULL,
 `TYPE` char(2) DEFAULT NULL,
 `creator_id` varchar(30) DEFAULT NULL,
 `create_time` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
mysql>
 

1.2 准备存储过程

DELIMITER $$
 
USE `hwdb`$$
 
DROP PROCEDURE IF EXISTS `pro_insert_hw`$$
 
CREATE DEFINER=`root`@`localhost` PROCEDURE`pro_insert_hw`(num INT)
BEGIN
    DECLARE var1 INT DEFAULT 0;   
   WHILE var1<num DO   
   SET var1=var1+1;   
  INSERT INTO hwdb.t_huawei(hwid,content,TYPE,creator_id,create_time)  
   VALUES (var1,CONCAT(var1,' - ','Players supposedly said they''d avoidteam facility this summer if Thibodeau remained the '),
0,10013,NOW());   
   END WHILE;    
 
 
   END$$
 
DELIMITER ;
 
 
mysql> show create table t_huawei;
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                       |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t_huawei | CREATE TABLE `t_huawei` (
 `hwid` int(11) DEFAULT NULL,
 `content` varchar(4000) DEFAULT NULL,
 `TYPE` char(2) DEFAULT NULL,
  `creator_id`varchar(30) DEFAULT NULL,
 `create_time` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
mysql>
 

1.3 开始录入数据

因为要录入千万级别数据,所以用后台mysql进程来录入,如下所示:
nohup /usr/local/mysql/bin/mysql -uroot--password="" -S /usr/local/mysql/mysql.sock -e "select now();truncate table hwdb.t_huawei; call hwdb.pro_insert_hw(10000000); select now();" > /data/call.log &
 
这里因为测试环境机器性能有限,在录入到7023850条的时候,我终止了进程,开始准备测试工作:
mysql> SELECT COUNT(1) FROM `t_huawei` ;
+----------+
| COUNT(1) |
+----------+
| 7023850 |
+----------+
1 row in set (5.52 sec)
 
mysql>
 
 

2,开始测试

 
 

2.1 普通的ORDER BY RAND()测试很慢很慢,如下所示,需要块1分钟的时间

mysql> SELECT  SQL_NO_CACHE * FROM `t_huawei` AS t1  ORDER BY RAND() LIMIT 5;
+---------+-----------------------------------------------------------------------------------------------------+------+------------+---------------------+
| hwid   | content                                                                                            | TYPE |creator_id | create_time         |
+---------+-----------------------------------------------------------------------------------------------------+------+------------+---------------------+
| 3743323 | 3743323 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:43:01|
| 2418491 | 2418491 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:19:31|
| 1224667 | 1224667 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:58:35|
| 1639270 | 1639270 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:05:50|
| 2756470 | 2756470 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:25:28|
+---------+-----------------------------------------------------------------------------------------------------+------+------------+---------------------+
5 rows in set (54.92 sec)
 
mysql>
 
 
原因是:MYSQL的官方手册,里面针对RAND()的提示大概意思就是,在ORDER BY从句里面不能使用RAND()函数,因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中,仍然可以通过ORDER BY RAND()来实现随机。
 
 

2.2 在JOIN中使用RAND()* (SELECTMAX(hwid) 的方式来实现

mysql> SELECT SQL_NO_CACHE t1.* FROM`t_huawei` AS t1 LEFT JOIN(SELECT ROUND(RAND()* (SELECT MAX(hwid) FROM  `t_huawei` )) AS id ) AS t2 ONt1.`hwid`>=t2.id ORDER BY t1.`hwid` ASC LIMIT 5;
+------+-----------------------------------------------------------------------------------------------+------+------------+---------------------+
| hwid | content                                                                                      | TYPE | creator_id | create_time        |
+------+-----------------------------------------------------------------------------------------------+------+------------+---------------------+
|   1 | 1 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |
|   2 | 2 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |
|   3 | 3 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |
|   4 | 4 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |
|   5 | 5 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |
+------+-----------------------------------------------------------------------------------------------+------+------------+---------------------+
5 rows in set (50.04 sec)
 
mysql>
需要50面,比起直接的order by来说少了4秒,但是还是比较耗时的。
 
 

2.3 WHERE子句的方式来优化

mysql> SELECT SQL_NO_CACHE t1.* FROMhwdb.`t_huawei` t1 WHERE t1.`hwid` >=(SELECT FLOOR(RAND() * (SELECTMAX(t2.hwid) FROM hwdb.`t_huawei` t2 ) )) ORDER BY t1.hwid LIMIT 5;
+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+
| hwid | content                                                                                         | TYPE | creator_id | create_time        |
+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+
| 1624 | 1624 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:05|
| 6076 | 6076 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:11|
| 7988 | 7988 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:14|
| 8016 | 8016 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:14|
| 8106 | 8106 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:14|
+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+
5 rows in set (7.25 sec)
 
mysql>
 
花了7.25秒,比上一种要快的多很多了。
 

2.4 MAX(hwid)- MIN(hwid)的方式来RAND来操作

mysql> SELECT SQL_NO_CACHE * FROMhwdb.`t_huawei` WHERE hwid >= (SELECT FLOOR(RAND() * ((SELECT MAX(hwid) FROMhwdb.`t_huawei`) - (SELECT MIN(hwid) FROM hwdb.`t_huawei`)) + (SELECT MIN(hwid)FROM hwdb.`t_huawei`))) ORDER BY hwid LIMIT 5;
+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+
| hwid | content                                                                                          |TYPE | creator_id | create_time         |
+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+
| 5438 | 5438 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:10|
| 5883 | 5883 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:10|
| 6782 | 6782 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:12|
| 7797 | 7797 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:13|
| 8625 | 8625 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:15|
+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+
5 rows in set (13.50 sec)
 
mysql>
MAX(hwid)- MIN(hwid)的方式来RAND来操作需要13.5秒,比where子句花时间要多一些。

3,总结

WHERE子句的方式效率最高,样例sql:SELECTSQL_NO_CACHE t1.* FROM hwdb.`t_huawei` t1 WHERE t1.`hwid` >=(SELECTFLOOR(RAND() * (SELECT MAX(t2.hwid) FROM hwdb.`t_huawei` t2 ) )) ORDER BYt1.hwid LIMIT 5;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: