您的位置：首页 > 数据库 > SQL

MySQL Range Optimization

2016-02-17 14:56 197 查看

8.2.1.3 Range Optimization

MYSQL的Range Optimization的目的还是尽可能的使用索引

The

range

access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections give descriptions of conditions under which the optimizer uses range access.

8.2.1.3.1 The Range Access Method for Single-Part Indexes
针对单一索引

For a single-part index, index value intervals can be conveniently represented by corresponding conditions in the

WHERE

clause, so we speak of range conditions rather than “intervals.”

The definition of a range condition for a single-part index is as follows:

For both

BTREE

and

HASH

indexes, comparison of a key part with a constant value is a range condition when using the

<=>

IN()

IS NULL

, or

IS NOT NULL

operators.

Additionally, for

BTREE

indexes, comparison of a key part with a constant value is a range condition when using the

>=

<=

BETWEEN

!=

, or

<>

operators, or

LIKE

comparisons if the argument to

LIKE

is a constant string that does not start with a wildcard character.

For all types of indexes, multiple range conditions combined with

OR

AND

form a range condition.

对于BTREE索引和HASH索引来说，索引的范围优化基本上只适用于等值查询。譬如=, <=>, IN(), IS NULL, IS NOT NULL操作符。

相对于HASH索引，BTREE索引同样支持非等值查询，譬如>, <, >=, <=, BETWEEN, !=, <>和LIKE（注意，like的常量值不能以通配符开头）

“Constant value” in the preceding descriptions means one of the following:

A constant from the query string

A column of a

const

system

table from the same join

The result of an uncorrelated subquery

Any expression composed entirely from subexpressions of the preceding types

常量值一般指三种：查询条件为常量，const表或system表，非关联子查询的结果

其中，const表指的是最多只有一个匹配行，譬如基于主键的查询：

SELECT * FROM tbl_name WHERE primary_key=1;

SELECT * FROM tbl_name
WHERE primary_key_part1=1 AND primary_key_part2=2;

Here are some examples of queries with range conditions in the

WHERE

clause:

SELECT * FROM t1
WHERE [code]key_col

> 1
AND

key_col

< 10;

SELECT * FROM t1
WHERE

key_col

= 1
OR

key_col

IN (15,18,20);

SELECT * FROM t1
WHERE

key_col

LIKE 'ab%'
OR

key_col

BETWEEN 'bar' AND 'foo';
[/code]
Some nonconstant values may be converted to constants during the constant propagation phase.

以下是MySQL提取范围条件的思路，并不是等价变换，目的还是在于尽可能的使用索引的范围查询，包括后面的红色部分也提到，这样变换后的条件会没有原来的条件严格，MySQL这样做的目的在于利用索引过滤掉很大一部分记录，然后再对剩下的记录进行额外的筛选。

MySQL tries to extract range conditions from the

WHERE

clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.

Consider the following statement, where

key1

is an indexed column and

nonkey

is not indexed:

SELECT * FROM t1 WHERE
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z');

The extraction process for key

key1

is as follows:

Start with original

WHERE

clause:

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z')

Remove

nonkey = 4

and

key1 LIKE '%b'

because they cannot be used for a range scan. The correct way to remove them is to replace them with

TRUE

, so that we do not miss any matching rows when doing the range scan. Having replaced them with

TRUE

, we get:

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR
(key1 < 'bar' AND TRUE) OR
(key1 < 'uux' AND key1 > 'z')

Collapse conditions that are always true or false:

(key1 LIKE 'abcde%' OR TRUE)

is always true

(key1 < 'uux' AND key1 > 'z')

is always false

Replacing these conditions with constants, we get:

(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)

Removing unnecessary

TRUE

and

FALSE

constants, we obtain:

(key1 < 'abc') OR (key1 < 'bar')

Combining overlapping intervals into one yields the final condition to be used for the range scan:

(key1 < 'bar')

In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the

WHERE

clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full

WHERE

clause.

The range condition extraction algorithm can handle nested

AND

OR

constructs of arbitrary depth, and its output does not depend on the order in which conditions appear in

WHERE

clause.

MySQL does not support merging multiple ranges for the

range

access method for spatial indexes. To work around this limitation, you can use a

UNION

with identical

SELECT

statements, except that you put each spatial predicate in a different

SELECT

.

8.2.1.3.2 The Range Access Method for Multiple-Part Indexes
针对复合索引

Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.

For example, consider a multiple-part index defined as

key1(key_part1
, [code]key_part2

key_part3

)[/code], and the following set of key tuples listed in key order:

key_part1

key_part2

key_part3

NULL 1 'abc'
NULL 1 'xyz'
NULL 2 'foo'
1 1 'abc'
1 1 'xyz'
1 2 'abc'
2 1 'aaa'
[/code]
The condition

key_part1

= 1[/code] defines this interval:

(1,-inf,-inf) <= ([code]key_part1

key_part2

key_part3

) < (1,+inf,+inf)
[/code]
The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.

By contrast, the condition

key_part3

= 'abc'[/code] does not define a single interval and cannot be used by the range access method.

如果使用前导列key_part1，则可以使用索引，如果直接使用key_part3，则不能使用索引。

The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.

For

HASH

indexes, each interval containing identical values can be used. This means that the interval can be produced only for conditions in the following form:

key_part1

cmp

const1

AND

key_part2

cmp

const2

AND ...
AND

key_partN

cmp

constN

;
[/code]
Here,

const1

const2

, … are constants,

cmp

is one of the

<=>

, or

IS NULL

comparison operators, and the conditions cover all index parts. (That is, there are

conditions, one for each part of an

-part index.) For example, the following is a range condition for a three-part

HASH

index:

key_part1

= 1 AND

key_part2

IS NULL AND

key_part3

= 'foo'
[/code]
For the definition of what is considered to be a constant, see Section 8.2.1.3.1, “The Range Access Method for Single-Part Indexes”.

For a

BTREE

index, an interval might be usable for conditions combined with

AND

, where each condition compares a key part with a constant value using

<=>

IS NULL

>=

<=

!=

<>

BETWEEN

, or

LIKE 'pattern
'

(where

'pattern
'

does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if

<>

!=

is used).

The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is

<=>

, or

IS NULL

. If the operator is

>=

<=

!=

<>

BETWEEN

, or

LIKE

, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses

from the first comparison. It also uses

>=

from the second comparison but considers no further key parts and does not use the third comparison for interval construction:

key_part1

= 'foo' AND

key_part2

>= 10 AND

key_part3

> 10
[/code]
The single interval is:

('foo',10,-inf) < ([code]key_part1

key_part2

key_part3

) < ('foo',+inf,+inf)
[/code]
It is possible that the created interval contains more rows than the initial condition. For example, the preceding interval includes the value

('foo', 11, 0)

, which does not satisfy the original condition.

对于BTREE的复合索引来说，一旦其中的一个索引列使用了非等值查询，则在其后的索引列将无法继续使用索引。

譬如：key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

因为第二个索引列key_part2使用了非等值查询，则第三个索引列key_part3无法使用索引。

所以，它的变换形式为('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)，

而不是('foo',10,10) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)

If conditions that cover sets of rows contained within intervals are combined with

OR

, they form a condition that covers a set of rows contained within the union of their intervals. If the conditions are combined with

AND

, they form a condition that covers a set of rows contained within the intersection of their intervals. For example, for this condition on a two-part index:

([code]key_part1

= 1 AND

key_part2

< 2) OR (

key_part1

> 5)
[/code]
The intervals are:

(1,-inf) < ([code]key_part1

key_part2

) < (1,2)
(5,-inf) < (

key_part1

key_part2

)
[/code]
In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The

key_len

column in the

EXPLAIN

output indicates the maximum length of the key prefix used.

In some cases,

key_len

may indicate that a key part was used, but that might be not what you would expect. Suppose that

key_part1

and

key_part2

can be

NULL

. Then the

key_len

column displays two key part lengths for the following condition:

key_part1

>= 1 AND

key_part2

< 2
[/code]
But, in fact, the condition is converted to this:

key_part1

>= 1 AND

key_part2

IS NOT NULL
[/code]

Section 8.2.1.3.1, “The Range Access Method for Single-Part Indexes”, describes how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index. Analogous steps are performed for range conditions on multiple-part indexes.

8.2.1.3.3 Equality Range Optimization of Many-Valued Comparisons

Consider these expressions, where

col_name

is an indexed column:

col_name

IN(

val1

, ...,

valN

)

col_name

val1

OR ... OR

col_name

valN

[/code]
Each expression is true if

col_name

is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:

If there is a unique index on

col_name

, the row estimate for each range is 1 because at most one row can have the given value.

Otherwise, the optimizer can estimate the row count for each range using dives into the index or index statistics.

如果是唯一索引，则每一个range的对应的row为1。如果不是唯一索引，优化器有两个方式来评估每个range对应的行数：index dives和index statistics。其中，index dives能提供更精确的估计，但是成本会比较高，index statistics速度较快，但精度没有index dives高，选择哪种方式由eq_range_index_dive_limit决定，5.7.3之前默认值为10，指的是当range的个数小于或等于9时，MySQL默认会选择index dives，超过9个，则选择index statistics。

With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression

col_name

IN (10, 20, 30)[/code] has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.

Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.

The

eq_range_index_dive_limit

system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To disable use of statistics and always use index dives, set

eq_range_index_dive_limit

to 0. To permit use of index dives for comparisons of up to

equality ranges, set

eq_range_index_dive_limit

+ 1.

To update table index statistics for best estimates, use

ANALYZE TABLE

.

8.2.1.3.4 Range Optimization of Row Constructor Expressions

As of MySQL 5.7.3, the optimizer is able to apply the range scan access method to queries of this form:

SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));

Previously, for range scans to be used it was necessary for the query to be written as:

SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
OR ( col_1 = 'c' AND col_2 = 'd' );

For the optimizer to use a range scan, queries must satisfy these conditions:

Only

IN

predicates can be used, not

NOT IN

.

There may only be column references in the row constructor on the

IN

predicate's left hand side.

There must be more than one row constructor on the

IN

predicate's right hand side.

Row constructors on the

IN

predicate's right hand side must contain only runtime constants, which are either literals or local column references that are bound to constants during execution.

Compared to similar queries executed before MySQL 5.7.3,

EXPLAIN

output for applicable queries changes from full table or index scan to range scan. Changes are also visible by checking the values of the

Handler_read_first

Handler_read_key

, and

Handler_read_next

status variables.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航