Query performance troubleshooting in SQL Server 2008: query_hash and query_plan_hash
2008-09-25 15:37
645 查看
Recently I have noticed 2 new columns added to sys.dm_exec_query_stats and sys.dm_exec_requests DMVs in SQL Server 2008: query_hash and query_plan_hash. Those columns can greatly enhance performance monitoring process. In SQL 2005 main query I'm using for query performance monitoring is:
It returns top 10 heaviest queries by average IO that exist in cache. I won't go deeper discussing plan cache memory pressure conditions that can force query plan out of cache thus preventing its detection. Let's just say that it works right in 99% of cases which is good enough. Memory pressure can be detected by other queries.
Which queries would be candidates for tuning? First, those with highest total IO. Second, queries with highest average IO that pass certain minimum number of executions criteria (we usually won't tune query, even the heaviest one, that runs once a month in some offline batch). As a side node - you're probably asking, why do I ignore CPU counters like total_worker_time. The reason is simple: in sys.dm_exec_query_stats this counter is unreliable. It shows incorrect numbers in case of parallel execution.
So, Houston, do we have a problem here? Unfortunately, we do - when application that works with the database doesn't make proper usage of parameterization (and we don't want to force parameterization via database level setting). In such a case we'll see lots of similar queries with 1 or 2 in execution_count and different values of should-be-parameters in query_text. We can miss such queries because every single one is not heavy enough to be of interest or because many of the queries aren't in cache anymore pushed out by new queries - even of the same type. It is especially realistic scenario for 32 bit systems where entire non-data cache is limited to 1GB of space.
What are our options with poorly parameterized queries? We can use CLR user-defined function provided by Itzik Ben-Gan in his book "Inside SQL Server 2005: T-SQL Querying" that uses regular expressions functionality in order to parameterize query text (this function is widely used for Profiler trace analysis). Query text can be passed through the function and used as grouping column. But even if we don't count performance and CPU price of grouping by text column, I know several organizations that just won't let you create your objects in their database.
Taking all the above into account, I was delighted to find out that in SQL 2008 Microsoft added query_hash and query_plan_hash columns to sys.dm_exec_query_stats DMV. query_hash would be the same for queries with similar logic, query_plan_hash would be the same for queries with similar execution plan. And what's the difference? For column with uneven data distribution, execution plan can be different depending on parameter value. If we have the same value in 90% of a table, Optimizer would sure choose scan option. For another value which is responsible for 0.1% of rows Optimizer will prefer index seek with key or RID lookup depending on table's structure. For those two queries we'll see the same query_hash but different query_plan_hash.
So new SQL 2008 version of the query is:
Another great usage of new columns is to create repository and monitor execution plan changes over time, i.e. changes in query_plan_hash for the queries with the same query_hash value. In SQL 2005 such monitoring was pretty complicated and required on-the-fly parameterization. In SQL 2008 it looks pretty straightforward.
SELECT TOP 10 qs.execution_count, (qs.total_physical_reads + qs.total_logical_reads + qs.total_logical_writes) AS [Total IO], (qs.total_physical_reads + qs.total_logical_reads + qs.total_logical_writes) /qs.execution_count AS [Avg IO], SUBSTRING(qt.[text], qs.statement_start_offset/2, ( CASE WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.[text])) * 2 ELSE qs.statement_end_offset END - qs.statement_start_offset)/2 ) AS query_text, qt.[dbid], qt.objectid, tp.query_plan FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text (qs.[sql_handle]) AS qt OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) tp ORDER BY [Avg IO] DESC --ORDER BY [Total IO] DESC
It returns top 10 heaviest queries by average IO that exist in cache. I won't go deeper discussing plan cache memory pressure conditions that can force query plan out of cache thus preventing its detection. Let's just say that it works right in 99% of cases which is good enough. Memory pressure can be detected by other queries.
Which queries would be candidates for tuning? First, those with highest total IO. Second, queries with highest average IO that pass certain minimum number of executions criteria (we usually won't tune query, even the heaviest one, that runs once a month in some offline batch). As a side node - you're probably asking, why do I ignore CPU counters like total_worker_time. The reason is simple: in sys.dm_exec_query_stats this counter is unreliable. It shows incorrect numbers in case of parallel execution.
So, Houston, do we have a problem here? Unfortunately, we do - when application that works with the database doesn't make proper usage of parameterization (and we don't want to force parameterization via database level setting). In such a case we'll see lots of similar queries with 1 or 2 in execution_count and different values of should-be-parameters in query_text. We can miss such queries because every single one is not heavy enough to be of interest or because many of the queries aren't in cache anymore pushed out by new queries - even of the same type. It is especially realistic scenario for 32 bit systems where entire non-data cache is limited to 1GB of space.
What are our options with poorly parameterized queries? We can use CLR user-defined function provided by Itzik Ben-Gan in his book "Inside SQL Server 2005: T-SQL Querying" that uses regular expressions functionality in order to parameterize query text (this function is widely used for Profiler trace analysis). Query text can be passed through the function and used as grouping column. But even if we don't count performance and CPU price of grouping by text column, I know several organizations that just won't let you create your objects in their database.
Taking all the above into account, I was delighted to find out that in SQL 2008 Microsoft added query_hash and query_plan_hash columns to sys.dm_exec_query_stats DMV. query_hash would be the same for queries with similar logic, query_plan_hash would be the same for queries with similar execution plan. And what's the difference? For column with uneven data distribution, execution plan can be different depending on parameter value. If we have the same value in 90% of a table, Optimizer would sure choose scan option. For another value which is responsible for 0.1% of rows Optimizer will prefer index seek with key or RID lookup depending on table's structure. For those two queries we'll see the same query_hash but different query_plan_hash.
So new SQL 2008 version of the query is:
;WITH CTE(TotalExecutions, [Total IO], [Avg IO], StatementTextForExample, plan_handle, QueyHash, QueryPlanHash) AS ( SELECT TOP 10 SUM(execution_count) AS TotalExecutions, SUM(total_physical_reads + total_logical_reads + total_logical_writes) AS [Total IO], SUM(total_physical_reads + total_logical_reads + total_logical_writes) / SUM(execution_count) AS [Avg IO], MIN(query_text) AS StatementTextForExample, MIN(plan_handle) AS plan_handle, query_hash AS QueryHash, query_plan_hash AS QueryPlanHash FROM ( SELECT qs.*, SUBSTRING(qt.[text], qs.statement_start_offset/2, ( CASE WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.[text])) * 2 ELSE qs.statement_end_offset END - qs.statement_start_offset)/2 ) AS query_text FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt WHERE qt.[text] NOT LIKE '%sys.dm_exec_query_stats%' ) AS query_stats GROUP BY query_hash, query_plan_hash ORDER BY [Avg IO] DESC ) SELECT TotalExecutions, [Total IO], [Avg IO], StatementTextForExample, tp.query_plan AS StatementPlan, QueyHash, QueryPlanHash FROM CTE OUTER APPLY sys.dm_exec_query_plan(plan_handle) AS tp ORDER BY [Avg IO] DESC; --ORDER BY [Total IO] DESC;
Another great usage of new columns is to create repository and monitor execution plan changes over time, i.e. changes in query_plan_hash for the queries with the same query_hash value. In SQL 2005 such monitoring was pretty complicated and required on-the-fly parameterization. In SQL 2008 it looks pretty straightforward.
相关文章推荐
- Troubleshooting Performance Problems in SQL Server 2005
- 转载:Plan freezing and other plan guide enhancements in SQL Server 2008
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(10)
- 《Microsoft Sql server 2008 Internal》读书笔记--第九章Plan Caching and Recompilation(2)
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(10)
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(11)
- Index Fragmentation Report in SQL Server 2005 and 2008
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(2)
- Learning Store procedure and debug in SQL Server 2008
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(5)
- Note on <Professional SQL Server 2012 Internals And Troubleshooting> - 01
- Troubleshooting deadlock in SQL Server
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(11)
- Troubleshooting the Performance of a SQL Server Solution(读书笔记,全是copy原文的)
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(1)
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(5)
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(8)
- Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
- Plan for caching and performance in SharePoint Server 2013
- 《Microsoft Sql server 2008 Internals》读书笔记--第九章Plan Caching and Recompilation(8)