The new Summary Collect Statistics feature starting with Teradata Release 14
2015-10-28 16:55
483 查看
Diagnostic helpstats on for session; (先执行这句话,然后再explain的话会提示你要添加的collect stas)
explain
select * from talbe
Summary statistics are a very useful feature, which is new in Teradata Release 14. Contrary to traditional statistics, which are collected on columns or indexes, summary statistics are table level statistics:
COLLECT SUMMARY STATISTICS ON <TABLE>;
Table level statistics are the information about the number of rows per table, the average data block sizes, and the average row sizes.
The number of rows per table is a very important measure for the Teradata Optimizer to estimate the cost of a full table scan. So, how was this information made available in previous releases of Teradata?
In previous releases of Teradata, this was achieved by collecting column statistics on the dummy “PARTITON” and the Primary Index columns.
Summary statistics which are defined on a table, will be refreshed each time the statistics on any column of the table are refreshed. In order to avoid the waste of resources, the best practice is to refresh all column statistics at once
instead of doing it column by column:
The preferable solution is this one:
COLLECT STATISTICS ON <TABLE>; — all column statistics are refreshed, summary statistics are only refreshed once.
Although less intense on resources than traditional column/index statistics, this should still be avoided:
COLLECT STATISTICS ON <TABLE> <COLUMN_1>; — first collection of summary statistics
COLLECT STATISTICS ON <TABLE> <COLUMN_2>; — needless 2nd collection of summary statistics
Conclusion: Summary statistics collection is a very useful feature to refresh the number of rows information per table quickly.
Dimitrios says
May 7, 2014 at 11:07 pm
Hello,
This new feature has only advantages , except from one thing , the way that we have to change the procedure which collect the stats in order to use this feature(s) (plus the max value length / max internal and sample ).
It is also recommended to big tables , where collecting stats consumes a lot of io/cpu , just
Collect Summary Stats on DB.TB ; ,after every DML transaction on it , in order to have updated summary stats on the table , in this way we help the parser to make better extrapolations .
Reply
![](http://cdn.dwhpro.com/wp-content/uploads/2015/04/businessman-310819_640-150x150.png)
Johne490 says
August 20, 2014 at 6:23 am
Very informative post.Really thank you! Awesome.
Reply
![](http://cdn.dwhpro.com/wp-content/uploads/2015/04/businessman-310819_640-150x150.png)
Sreeraj says
March 30, 2015 at 7:23 pm
Thank you. very informative.
Question: for TD 14 onwards, the best practice is to collect all column statistics at once instead of column by column. is this true for big tables as well? I was under the impression that we need to collect stats only for certain columns based on the PI /
join and WHERE. could you please clarifiy?
Reply
![](http://cdn.dwhpro.com/wp-content/uploads/2015/03/bio_roland-65x65.jpg)
Roland
Wenzlofsky says
March 30, 2015 at 7:31 pm
Hi Sreeraj. I think this is a misunderstanding.
“All columns at once” means to collect statistics with the new syntax which allows to collect the stats on different columns of the same table at once.
Before Teradata 14 we had to collect stats column by column i.e. issue one statement per column. The only performance advantage previously possible was synchronized scanning on the spool (this was achieved by starting the collect statistics statements for one
table at the same time).
“All columns at once” does not mean to collect statistics on each and every column of a table…
Roland
explain
select * from talbe
Summary statistics are a very useful feature, which is new in Teradata Release 14. Contrary to traditional statistics, which are collected on columns or indexes, summary statistics are table level statistics:
COLLECT SUMMARY STATISTICS ON <TABLE>;
Table level statistics are the information about the number of rows per table, the average data block sizes, and the average row sizes.
The number of rows per table is a very important measure for the Teradata Optimizer to estimate the cost of a full table scan. So, how was this information made available in previous releases of Teradata?
In previous releases of Teradata, this was achieved by collecting column statistics on the dummy “PARTITON” and the Primary Index columns.
Summary statistics which are defined on a table, will be refreshed each time the statistics on any column of the table are refreshed. In order to avoid the waste of resources, the best practice is to refresh all column statistics at once
instead of doing it column by column:
The preferable solution is this one:
COLLECT STATISTICS ON <TABLE>; — all column statistics are refreshed, summary statistics are only refreshed once.
Although less intense on resources than traditional column/index statistics, this should still be avoided:
COLLECT STATISTICS ON <TABLE> <COLUMN_1>; — first collection of summary statistics
COLLECT STATISTICS ON <TABLE> <COLUMN_2>; — needless 2nd collection of summary statistics
Conclusion: Summary statistics collection is a very useful feature to refresh the number of rows information per table quickly.
Dimitrios says
May 7, 2014 at 11:07 pm
Hello,
This new feature has only advantages , except from one thing , the way that we have to change the procedure which collect the stats in order to use this feature(s) (plus the max value length / max internal and sample ).
It is also recommended to big tables , where collecting stats consumes a lot of io/cpu , just
Collect Summary Stats on DB.TB ; ,after every DML transaction on it , in order to have updated summary stats on the table , in this way we help the parser to make better extrapolations .
Reply
![](http://cdn.dwhpro.com/wp-content/uploads/2015/04/businessman-310819_640-150x150.png)
Johne490 says
August 20, 2014 at 6:23 am
Very informative post.Really thank you! Awesome.
Reply
![](http://cdn.dwhpro.com/wp-content/uploads/2015/04/businessman-310819_640-150x150.png)
Sreeraj says
March 30, 2015 at 7:23 pm
Thank you. very informative.
Question: for TD 14 onwards, the best practice is to collect all column statistics at once instead of column by column. is this true for big tables as well? I was under the impression that we need to collect stats only for certain columns based on the PI /
join and WHERE. could you please clarifiy?
Reply
![](http://cdn.dwhpro.com/wp-content/uploads/2015/03/bio_roland-65x65.jpg)
Roland
Wenzlofsky says
March 30, 2015 at 7:31 pm
Hi Sreeraj. I think this is a misunderstanding.
“All columns at once” means to collect statistics with the new syntax which allows to collect the stats on different columns of the same table at once.
Before Teradata 14 we had to collect stats column by column i.e. issue one statement per column. The only performance advantage previously possible was synchronized scanning on the spool (this was achieved by starting the collect statistics statements for one
table at the same time).
“All columns at once” does not mean to collect statistics on each and every column of a table…
Roland
相关文章推荐
- JSP文件下载及getOutputStream() has already been的解决
- JS判断浏览器类型与版本
- 如何优化你的JS代码
- (全面 经典 管用)Windows7 64位+Cuda6.5+vs2012 的caffe配置历程
- 仿百度纯JS日历 带农历
- jstl c标签 14个(一)
- jstl c标签 14个(二)
- jstl fn 函数
- JSTL fmt 标签
- 创建跨浏览器Javascript的XMLDocument对象
- js 是否注册 OCX
- javascript读写文件(支持firefox和IE)
- Node.js的开源、多进程实时WebSocket引擎:SocketCluster
- js限制上传文件的大小
- VMware下安装RedHat,Ubuntu,Fedora(下)
- -moz-border-radius CSS属性演示源代码:FireFox下圆角矩形
- CSS网页布局中 DIV和TABLE超出宽度自动换行的情况分析
- CSS 中文字体的英文名称 (simhei, simsun) 宋体 微软雅黑
- CSS透明滤镜,支持FF,IE6.0,IE7.0,IE8.0
- JQuery之form插件