您的位置:首页 > 数据库

SQL优化【基础06】 - 统计信息(直方图)

2014-08-11 16:51 267 查看
前言:从10开始默认优化器就是CBO模式,RBO已经靠边站了,CBO是根据什么来计算成本的,当然就是信息了,这个信息就是所谓统计信息,可见统计信息的重要性;

版本:11.2

先来看张表T1,未做统计前可以看到统计信息都是空的(空时CBO要获取信息采用动态采样),信息是通过脚本SOSI.SQL(群里上载文件中有)显示;



exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true);

统计完后的信息显示



--执行一条语句看下计划,是范围扫描,与实际相符;
SQL> @allstat

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  ccm5vanq8y709, child number 0
-------------------------------------
select status from t1 where object_id=88

Plan hash value: 190799060

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |      1 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN          | IDX_T1_ID |      1 |      1 |      1 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=88)

19 rows selected.

--更改数据造成数据倾斜;100条,88的99条,99的值1条;
SQL> update t1 set object_id=88 where rownum<100;

99 rows updated.

SQL> commit;

Commit complete.

SQL> update t1 set object_id=99 where object_id<>88;

1 row updated.

SQL> commit;

Commit complete.

--再次收集必变后的表信息:exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true);
--这个计划是对的,因为88的值的行数占了99%;
select /*1*/status from t1 where object_id=88;
SQL> @allstat

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4natz4s7h1uda, child number 0
-------------------------------------
select /*1*/status from t1 where object_id=88

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |     99 |00:00:00.01 |      10 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |     99 |     99 |00:00:00.01 |      10 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=88)

18 rows selected.

--重新做次收集,目的关闭直方图(for..size 1);
exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true,method_opt=>'for all columns size 1');

--看下执行计划,可以看到计划并非最优,它选择了范围扫描,而本来是全表扫描会略好;
select /*3*/status from t1 where object_id=88;

SQL> @allstat

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  cn417ch2unkp7, child number 0
-------------------------------------
select /*3*/status from t1 where object_id=88

Plan hash value: 190799060

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      1 |        |     99 |00:00:00.01 |      17 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |     50 |     99 |00:00:00.01 |      17 |
|*  2 |   INDEX RANGE SCAN          | IDX_T1_ID |      1 |     50 |     99 |00:00:00.01 |       8 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=88)

19 rows selected.


--看下此时列的直方图情况,可以看到直方图确实都关闭了;



--重新收集列OBJECT_ID上的直方图信息

exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true,method_opt=>'for columns object_id size auto',estimate_percent=>100);



select /*5*/status from t1 where object_id=88;

--计划已经回归到最优了;
SQL> @allstat

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  gts3wuc1sy8dn, child number 0
-------------------------------------
select /*5*/status from t1 where object_id=88

Plan hash value: 3617692013

------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |     99 |00:00:00.01 |      10 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |     99 |     99 |00:00:00.01 |      10 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=88)

18 rows selected.


--直方图是什么 ?直方图就是列的统计信息,看下面的图你就明白了,88的值有99个,99的值有100-99个,100是累加值;

select * from dba_tab_histograms where owner='AIKI' and table_name='T1'



--直方图分为:等频和等高;怎么区别,很简单,列的唯一值>桶数:等高(HEIGHT BALANCED);等频:唯一值<=桶数,如下图所示,(其中11G之前的最大桶数不能超过254,软件局限),12C可超过;

select owner,table_name,column_name, histogram from dba_tab_col_statistics

where owner='AIKI'

and table_name='T1'



分区表的统计收集命令,操练下就明白了:

exec dbms_stats.gather_table_stats('AIKI','PT',partname=>'T2',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',granularity=>'ALL'); --全局表+分区表的信息都做统计

exec dbms_stats.gather_table_stats('AIKI','PT',partname=>'T2',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',granularity=>'PARTITION'); --只分区表做统计

统计信息与应用关系:

1.通过上面测试可以看到,当数据出现倾斜时,则直方图就有存在的意义,否则就无意义了;并且此列在做收集前必须存在于运行SQL的WHERE条件后,这个很好理解(都没用,做直方图也无意义了);

2.比如这条语句:select status from t1 where object_id=66 and status='VALID';当出现执行慢时你会怎么思考?

很简单 :先SOSI下看下列object_id,status的列的唯一值数(选择率),表T1的统计信息,这样心中便可大致确定跟此列相关的索引情况,看下统计时间,是否分区表,索引分布,数据采样比例等;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐