查询优化器如何使用(索引)统计信息 (How the Query Optimizer Uses Statistics)
2009-09-09 13:15
525 查看
How the Query Optimizer Uses Statistics
from: http://sqlblog.com/blogs/ben_nevarez/archive/2009/09/03/how-the-query-optimizer-uses-statistics.aspxThis post shows how the Query Optimizer uses statistics to estimate the selectivity of expressions during query optimization.
You can also use this as a second part of my last post, The Query Optimizer and Parameter Sniffing. Here I will show you the advantage of the use of statistics when the Query Optimizer can “sniff” the parameter values compared to just guessing the selectivity of expressions when local variables are used. So I will be using the same query as in that previous post.
Notice that there are many other details that can not be covered here so this post will focus on a very simple example to show how the Query Optimizer creates and uses statistics.
To start open your AdventureWorks database and run this to display the current statistics on the ProductID column of the Sales.SalesOrderDetail table
dbcc show_statistics('Sales.SalesOrderDetail', IX_SalesOrderDetail_ProductID) This will display the header, density vector and histogram of the statistics object.
1) Understanding the Histogram
First I will explain the meaning of the values of a histogram’s steps. Let us take a look at step 86, shown here
RANGE_HI_KEY is the upper boundary of a histogram step. The value 826 is the upper boundary for step 85 and 831 is the upper boundary for step 86. This means that step 86 may contain only values from 827 to 831.
Run the following query to obtain the real number of records for ProductIDs 827 to 831 to compare them against the histogram
select ProductID, COUNT(*) as Total from Sales.SalesOrderDetail where ProductID between 827 and 831 group by ProductID This produces the following result
Going back to the histogram, EQ_ROWS is the estimated number of rows whose column value equals RANGE_HI_KEY. In our example RANGE_HI_KEY is 831 and the number of records with ProductID 831 is 198. The same value is shown on the histogram.
RANGE_ROWS is the estimated number of rows whose column value falls inside the range of the step, excluding the upper boundary. In our example, this is the number of records with values from 827 to 830 (831, the upper boundary, is excluded). The histogram shows 110 records and we could obtain the same value by getting the sum of 31 records for 827, 46 records for 828, 0 records for 829 and 33 records for 830.
DISTINCT_RANGE_ROWS is the estimated number of rows with a distinct column value inside this range, excluding the upper bound. In our example we have records for three distinct values: 827, 828, and 830, so DISTINCT_RANGE_ROWS is 3. There are no records for ProductID 829 and 831 is again excluded.
Finally, AVG_RANGE_ROWS is the average number of rows per distinct value and it is calculated as RANGE_ROWS / DISTINCT_RANGE_ROWS. In our example, we have a total of 110 records for 3 DISTINCT_RANGE_ROWS, so that gives, 110 / 3 = 36.6667 also shown on the histogram for step 86.
Now let us see how the statistics are used to estimate the selectivity of the queries.
2) When the Query Optimizer knows the value
Let us see the first query
select * from Sales.SalesOrderDetail where ProductID = 831
Since 831 is a RANGE_HI_KEY on step 86, the Query Optimizer will use the EQ_ROWS value and the estimated number of rows will be 198.
Now run the same query with the value 828
This time the value is inside the range of step 86 but it is not a RANGE_HI_KEY so the Query Optimizer uses the value calculated before as AVG_RANGE_ROWS. Actually, we get the same estimated number of rows for any of the other values in the range (except RANGE_HI_KEY). This also includes 829, even when there are no records for this ProductID.
Let us try now a nonequality operator and try to find the number of records less than 714. For these we need to calculate the sum of the values of both RANGE_ROWS and EQ_ROWS for steps 1 thru 7, which give us a total of 13,223 rows.
This is the query and the estimated number of rows is shown on the execution plan
select * from Sales.SalesOrderDetail where ProductID < 714
3) When the Query Optimizer does not know the value
In the case when the Query Optimizer does not know the value used in the expression, like when local variables are used, the Query Optimizer can not use the histogram so it will use some other information including the statistics density information or it will try to guess the selectivity using some fixed percentages. First, let us try using the equality operator.
declare @pid int = 897 select * from Sales.SalesOrderDetail where ProductID = @pid
The Query Optimizer is not able to see the value 897 in this query. As explained in my previous post, the Query Optimizer does not know the value of the @pid local variable at optimization time. So it will use the density value of the ProductID column, 0.003759399, as listed on the second section, density vector, of the DBCC SHOW_STATISTICS output. If we multiply this value by the total number of records, 121,317, we will get 456.079 which will be shown in the next execution plan.
Finally, run this query with a nonequality operator
declare @pid int = 897 select * from Sales.SalesOrderDetail where ProductID < @pid
Same as before, the value 897 does not matter; any other value will give you the same estimated number of rows and execution plan. The estimated number of rows is always 30% of the total number of records for a nonequality operator. In this case the 30% of 121,317 is 36,395.1 as shown next.
相关文章推荐
- 使用DBCC SHOW_STATISTICS展示索引的统计信息
- 如何使用spring将service层注入到servlet中去(how to use Spring to inject ur service layer into the servlet )
- CP How-To:如何使用cPanel查看站点统计数据(awstats)
- How to change the Windows Messenger(MSN)'s personal information using C++ - 用C++如何设置Windows Messenger(MSN)的个人信息
- 使用索引统计信息(Index Statistics)优化查询语句,提高查询效率
- SQL优化----如何使用工具快速诊断出统计信息有问题?
- 译文:如何使用SocketAsyncEventArgs类(How to use the SocketAsyncEventArgs class)
- 如何使用dbms_stats分析统计信息
- [转载]如何使用SocketAsyncEventArgs类(How to use the SocketAsyncEventArgs class)
- 如何查看表和索引的统计信息
- 如何使用表维护工具。how to use the table extent maintenance tool
- 如何使用dbms_stats分析统计信息?
- 从一个SQL使用了不理想的执行计划说开,浅谈执行计划如何估算cache信息的影响及系统统计信息的收集等(2010-10-15)
- How To Get Application Path,And How To Get The Path's Directory,File Name,And Type Info(如何得到运行程序的路径,以及如何得到路径的文件夹,文件名,以及类型等等信息)
- 如何知道执行计划使用了那些统计信息
- 使用索引统计信息(Index Statistics)优化查询语句,提高查询效率
- How Do I Use the Windows System Restore Utility?——如何使用window 系统还原工具?
- 如何查看表和索引的统计信息
- 译文:如何使用SocketAsyncEventArgs类(How to use the SocketAsyncEventArgs class)
- 如何使用spring将service层注入到servlet中去(how to use Spring to inject ur service layer into the servlet )