您的位置：首页 > 数据库 > MySQL

MySQL多表连接优化实例分析

2017-03-13 19:04 274 查看

一位朋友的学习心得,蓝色字体是对这位朋友问题的分析和回答:

MySQL查询计划学习心得:
例子：
mysql> explain select * from (t1 left join t2 on true),(t3 full join t4 on true),t5
where id1=id2 and id2=id3 and id3=id4 and id4=id5;
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref        | rows | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| 1 | SIMPLE      | t4    | ALL    | NULL          | NULL    | NULL    | NULL       |    7 | Using where                                        |
| 1 | SIMPLE      | t5    | ref    | id5           | id5     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | full | ref    | id3           | id3     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL       | 100 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
5 rows in set

一：MySQL查询计划输出列的含义：
1.id：每个被独立执行的操作的标识，表示对象被操作的顺序；id值越大，先被执行；如果相同，执行顺序从上到下。
2.select_type：查询中每个select子句的类型。
3.table：名字，被操作的对象的名称，通常是表名，但有其他格式。
4.partitions:匹配的分区信息（对于非分区表值为NULL）。
5.type：连接操作的类型。
6.possible_keys：备选的索引（列出可能被使用到的索引）。
7.key：经优化器选定的索引；常使用ANALYZE TABLE命令，可以使优化器正确地选择索引。
8.key_len：被优化器选定的索引键的长度，单位是字节。
9.ref：表示本行被操作的对象的参照对象（被参照的对象可能是一个常量用const表示，也可能是其他表的key指向的对象）。
10.rows：查询执行所扫描的元组的个数（对于InnoDB，此值是估计值）。
11.filtered：按照条件表上数据被过滤的元组个数的百分比，rows x filtered/100可以求出过滤后的元组数即实际的元组数。
12.Extra：MySQL查询优化器执行查询的过程中对查询计划的重要补充信息。

二：实验

1.创建表
create table t1(id1 int,a1 int,b1 int,primary key(id1));
create table t2(id2 int,a2 int,b2 int);
create table t3(id3 int UNIQUE,a3 int,b3 int);
create table t4(id4 int,a4 int,b4 int);
create table t5(id5 int UNIQUE,a5 int,b5 int);

2.构造数据
insert into t1(id1,a1,b1)
select stu_test_quest_id,student_id,test_question_id from english_stu_test_quest order by stu_test_quest_id limit 10000;

insert into t2(id2,a2,b2)
select stu_test_quest_id,student_id,test_question_id from english_stu_test_quest order by stu_test_quest_id limit 100;

insert into t3(id3,a3,b3)
select stu_test_quest_id,student_id,test_question_id from english_stu_test_quest order by stu_test_quest_id limit 100;

insert into t4(id4,a4,b4)
select stu_test_quest_id,student_id,test_question_id from english_stu_test_quest order by stu_test_quest_id limit 7;

insert into t5(id5,a5,b5)
select stu_test_quest_id,student_id,test_question_id from english_stu_test_quest order by stu_test_quest_id limit 10;

3.实验1
mysql> explain select * from (t1 left join t2 on true),(t3 full join t4 on true),t5
where id1=id2 and id2=id3 and id3=id4 and id4=id5;
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref        | rows | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| 1 | SIMPLE      | t4    | ALL    | NULL          | NULL    | NULL    | NULL       |    7 | Using where                                        |
| 1 | SIMPLE      | t5    | ref    | id5           | id5     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | full | ref    | id3           | id3     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL       | 100 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
5 rows in set

3.1. 结论：
1)id值都是1，执行顺序从上到下。
2)t4表的记录数最少但没有索引，所以t4作为驱动表，第一个被连接；
t5表记录数少且有索引，第二个被连接；
t3表记录数较多且有索引，第三个被连接；
t1表记录数多且有索引，第四个被连接；
t2表记录数较多但没索引，第五个被连接。
所以表的连接次序是:t4、t5、t3、t1、t2。
3)select_type=SIMPLE,说明是简单的select语句，不包括union或子查询。
4)type的分析：
第一行type=ALL,说明对t4表进行了全表扫描，t4表本身没有索引，所以必然是全表扫描。
第二行type=ref,说明对t5表进行了索引扫描，t5表是内表，即被驱动的表。
第三行type=ref,说明对t3表进行了索引扫描，t3表是内表，即被驱动的表。
第四行type=eq_ref,说明对t1表进行了主键索引扫描，t1表是内表，即被驱动的表。
第五行type=ALL,说明对t2表进行了全表扫描。
5)key的分析：
第一行key=NULL,说明t4表没有可用的索引。
第二行key=id5,说明对t5表使用了索引键id5。
第三行key=id3,说明对t3表使用了索引键id3。
第四行key=PRIMARY,说明对t1表使用了主键索引。
第五行key=NULL,说明t2表没有可用的索引。
6)Extra的分析：
第一行Extra=Using where,说明对t4表使用了where子句进行过滤元组。
第五行Extra=Using where; Using join buffer (Block Nested Loop),说明使用了连接缓存,Block Nested Loop表明连接算法是块嵌套循环连接。

4.实验2
给t4表的id4列创建索引。
ALTER TABLE t4 ADD INDEX ix_t4_id4 (id4) USING BTREE;
analyze table t4;

mysql> explain select * from (t1 left join t2 on true),(t3 full join t4 on true),t5
where id1=id2 and id2=id3 and id3=id4 and id4=id5;
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref        | rows | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| 1 | SIMPLE      | t4    | ALL    | ix_t4_id4     | NULL    | NULL    | NULL       |    7 | Using where                                        |
| 1 | SIMPLE      | t5    | ref    | id5           | id5     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | full | ref    | id3           | id3     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL       | 100 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
5 rows in set

4.1. 结论：只分析和实验1不同的。
1)表的连接顺序没有改变t4、t5、t3、t1、t2。t4仍然是驱动表。
2)select_type=SIMPLE没有改变。
3)第一行我们分析的重点：
type=ALL，说明对t4表进行了全表扫描；
possible_keys=ix_t4_id4，说明可能会用到t4表的ix_t4_id4索引（这个索引就是我们刚才创建的）。
key=NULL，说明t4表没有可用的索引，就是说没有使用ix_t4_id4这个索引。
---key=NULL，说明t4表没有使用索引,不是没有可用的索引(注意观察possible_keys列, 说明t4存在索引ix_t4_id4,只是没有被使用)
问题来了，既然t4表的id4列上有索引，为什么优化器不选择索引扫描而全表扫描呢？
---问题问得很好. 这涉及了MySQL的多表连接算法. MySQL先把常量表放到连接的最前面,然后把所有非常量表按元组数从小达大排序,这样,个数少的被排在了前面.
---然后,又运用贪婪+穷举+剪枝优化(这些第10课会讲)算法,对多表如何连接进行代价估算,选出代价小的连接方式,最后才确定了连接次序.
---所以,本例中,t4表上尽管有索引,但也不是必定要使用到,元组个数决定了它的位置靠前.
---另外,可以用一个简单事例思考两表连接:如下语句（表t4上已经在id列上创建了索引）--索引使用的条件是什么？
---select * from t1,t4 where id1=id4;
---A：索引要想被使用，索引值必定在已知的情况下，才可能被使用
我觉得t4表作为驱动表，都要load到内存里，使用索引还不如全表扫描快，所以查询优化器选择了全表扫描算法。这个我也是自己想的，希望老师给出更合理的解释，非常感谢。

5.实验3
将t4表的id4列设为主键。
ALTER TABLE t4 DROP INDEX ix_t4_id4;
ALTER TABLE t4
MODIFY COLUMN id4 int(11) NOT NULL FIRST ,
ADD PRIMARY KEY (id4);
analyze table t4;

mysql> explain select * from (t1 left join t2 on true),(t3 full join t4 on true),t5
where id1=id2 and id2=id3 and id3=id4 and id4=id5;
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref        | rows | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
| 1 | SIMPLE      | t4    | ALL    | PRIMARY       | NULL    | NULL    | NULL       |    7 | Using where                                        |
| 1 | SIMPLE      | t5    | ref    | id5           | id5     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | full | ref    | id3           | id3     | 5       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | zsm.t4.id4 |    1 | NULL                                               |
| 1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL       | 100 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------------+
5 rows in set

5.1. 结论：只分析和实验1不同的。
1)表的连接顺序没有改变t4、t5、t3、t1、t2。t4仍然是驱动表。
2)select_type=SIMPLE没有改变。
3)第一行我们分析的重点：
type=ALL，说明对t4表进行了全表扫描；
possible_keys=PRIMARY，说明可能会用到t4表的主键索引。
key=NULL，说明t4表没有可用的索引，就是说没有使用主键索引。
问题来了，既然t4表的id4列上有主键索引，为什么优化器不选择主键索引扫描而全表扫描呢？
---问题的回答，类似如上蓝色字体
我觉得t4表作为驱动表，都要load到内存里，使用主键索引还不如全表扫描快，所以查询优化器选择了全表扫描算法。这个我也是自己想的，希望老师给出更合理的解释，非常感谢。

6.实验4
给t4表的a4列创建索引。
ALTER TABLE t4 ADD INDEX ix_t4_a4 (a4) USING BTREE ;
analyze table t4;

show index from t4;

mysql> show index from t4;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t4    |          0 | PRIMARY |            1 | id4         | A         |           7 | NULL     | NULL   |      | BTREE      |         |               |
| t4    |          1 | ix_t4_a4 |            1 | a4          | A         |           2 | NULL     | NULL   | YES | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set

mysql> explain select a4 from (t1 left join t2 on true),(t3 full join t4 on true),t5
where id1=id2 and id2=id3 and id3=id4 and id4=id5;
+----+-------------+-------+--------+---------------+----------+---------+------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key      | key_len | ref        | rows | Extra                                              |
+----+-------------+-------+--------+---------------+----------+---------+------------+------+----------------------------------------------------+
| 1 | SIMPLE      | t4    | index | PRIMARY       | ix_t4_a4 | 5       | NULL       |    7 | Using where; Using index                           |
| 1 | SIMPLE      | t5    | ref    | id5           | id5      | 5       | zsm.t4.id4 |    1 | Using index                                        |
| 1 | SIMPLE      | full | ref    | id3           | id3      | 5       | zsm.t4.id4 |    1 | Using index                                        |
| 1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | zsm.t4.id4 |    1 | Using index                                        |
| 1 | SIMPLE      | t2    | ALL    | NULL          | NULL     | NULL    | NULL       | 100 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+----------+---------+------------+------+----------------------------------------------------+
5 rows in set

6.1. 结论：只分析和实验1不同的。
1)表的连接顺序没有改变t4、t5、t3、t1、t2。t4仍然是驱动表。
2)select_type=SIMPLE没有改变。
3)第一行我们分析的重点：
type=index，说明对t4表进行了索引扫描；
possible_keys=PRIMARY，说明可能会用到t4表的主键索引。
key=ix_t4_a4，说明使用了t4表的ix_t4_a4索引。
Extra=Using where; Using index 说明对t4表使用了索引进行索引键值的查找。

问题来了，t4仍然是驱动表，为什么这次没有走全表扫描算法，而走索引扫描的算法呢？
我觉得是这样的，因为这次查询的结果集不同了，这次只查询t4表的a4列。而a4列上又有索引，优化器直接选择索引全扫描算法，不需要回表再查询了，这样效率更高。
这个我也是自己想的，希望老师给出更合理的解释，非常感谢。
---非常正确。
---这个是“只读索引扫描技术”：即只从索引树上读取数据，即可获得结果集，这样就避免了读取数据文件引发的IO。

7.实验5
给t2表的id2列创建索引。
ALTER TABLE t2 ADD INDEX ix_t2_id2 (id2) USING BTREE ;
analyze table t2;

mysql> explain select * from (t1 left join t2 on true),(t3 full join t4 on true),t5
where id1=id2 and id2=id3 and id3=id4 and id4=id5;
+----+-------------+-------+--------+---------------+-----------+---------+------------+------+-------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref        | rows | Extra       |
+----+-------------+-------+--------+---------------+-----------+---------+------------+------+-------------+
| 1 | SIMPLE      | t4    | ALL    | PRIMARY       | NULL      | NULL    | NULL       |    7 | Using where |
| 1 | SIMPLE      | t5    | ref    | id5           | id5       | 5       | zsm.t4.id4 |    1 | NULL        |
| 1 | SIMPLE      | t2    | ref    | ix_t2_id2     | ix_t2_id2 | 5       | zsm.t4.id4 |    1 | NULL        |
| 1 | SIMPLE      | full | ref    | id3           | id3       | 5       | zsm.t4.id4 |    1 | NULL        |
| 1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY   | 4       | zsm.t4.id4 |    1 | NULL        |
+----+-------------+-------+--------+---------------+-----------+---------+------------+------+-------------+
5 rows in set

7.1. 结论：只分析和实验1不同的。
1)表的连接顺序改变了t4、t5、t2、t3、t1。t4仍然是驱动表。
为什么表的连接顺序改变了？我觉得是因为t2表的id2列上有索引了，并且t2表的数量要远远小于t1表，所以把t2提前了。
2)possible_keys列里面都显示走索引，因为目前这5张表在连接列上都有索引了。
3)key列里除了第一个之外，优化器都选择走索引算法。这也在情理之中，因为目前这5张表在连接列上都有索引了，并且t4是驱动表。
对于这个sql此时的性能应该是最好的。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

添加评论
分享网址
分享文章
返回顶部