DB2 Runstats 抽样统计 sampled
2016-01-25 14:00
387 查看
CALL SYSPROC.ADMIN_CMD ('RUNSTATS ON TABLE wh.employee WITH DISTRIBUTION AND DETAILED INDEXES ALL TABLESAMPLE BERNOULLI(15)')
runstats on table wh.employee with distribution on all columns
and detailed indexes all ;
runstats on table wh.employee with distribution on all columns
and sampled detailed indexes all ;
SYSTEM( 系统页级抽样,默认)和 BERNOULLI( 行级抽样)。
runstats on table wh.employee with distribution on all columns
and detailed indexes all tablesample system (10); ----- 10 percent of 抽样记录页
runstats on table wh.employee with distribution on all columns
and sampled detailed indexes all tablesample bernoulli (15); ------ 15 percent of 抽样记录行
of P/100 and excludes it with a probability of 1 - P/100.
For example, for 10% Bernoulli sampling, (10/100) 10% of the rows will be selected and (1-10/100) 90% of the rows will be rejected. In Bernoulli sampling, an I/O will be incurred for each page since the table will be scanned. A random number will be generated
to determine if a row will be selected or not (similar to flipping a coin with a probability of P/100). Even if the I/O happens for every page, we still save on CPU time required to process the data.
With row-level Bernoulli sampling, every data page is read. However, it can still produce significant performance improvement because RUNSTATS is CPU intensive. If indexes are available, then the sampling is improved. It may also provide more accurate statistics
if the data is clustered (obtains a sample that better represents overall table data).
The benefit of system page-level sampling over a full table scan or Bernoulli sampling is the savings in I/O.
The sampled pages are also prefetched, so this method would be faster than row-level Bernoulli sampling. Compared with no sampling, page-level sampling significantly improves performance.
The RUNSTATS repeatable clause allows the same sample to be generated across RUNSTATS statements a long as the table data has not changed. To specify this option, the user must also supply an integer that will represent the seed to be used for sample generation.
By using the same seed, the same sample can be generated.
In summary, the accuracy of statistics depends on the sampling rate, the data skew, and data clustering for data sampling.
Some examples of RUNSTATS using Bernoulli row-level and system page-level sampling are as follows:
Example 29. Collect statistics, including distribution statistics on 10 percent of the rows
Example 30. To control the sample set on which the statistics will be collected and to be able to repeatedly use the same sample set, use the following:
Example 31. To collect index statistics as well as table statistics on 10 percent of the data pages. Note that only the table data pages and not the index pages are sampled. In this example, 10 percent of table data pages are used for the collection
of table statistics, while for index statistics all of the index pages will be used.
http://www.ibm.com/developerworks/cn/data/library/techarticle/dm-1211wangxy/
runstats on table wh.employee with distribution on all columns
and detailed indexes all ;
runstats on table wh.employee with distribution on all columns
and sampled detailed indexes all ;
SYSTEM( 系统页级抽样,默认)和 BERNOULLI( 行级抽样)。
runstats on table wh.employee with distribution on all columns
and detailed indexes all tablesample system (10); ----- 10 percent of 抽样记录页
runstats on table wh.employee with distribution on all columns
and sampled detailed indexes all tablesample bernoulli (15); ------ 15 percent of 抽样记录行
RUNSTATS with row-level Bernoulli sampling
Row-level Bernoulli sampling gets a sample of P percent of the table rows by means of a sargable (search + argument-able predicate is a predicate that can be evaluated by the Data Manager) predicate that includes every row in the sample with the probabilityof P/100 and excludes it with a probability of 1 - P/100.
For example, for 10% Bernoulli sampling, (10/100) 10% of the rows will be selected and (1-10/100) 90% of the rows will be rejected. In Bernoulli sampling, an I/O will be incurred for each page since the table will be scanned. A random number will be generated
to determine if a row will be selected or not (similar to flipping a coin with a probability of P/100). Even if the I/O happens for every page, we still save on CPU time required to process the data.
With row-level Bernoulli sampling, every data page is read. However, it can still produce significant performance improvement because RUNSTATS is CPU intensive. If indexes are available, then the sampling is improved. It may also provide more accurate statistics
if the data is clustered (obtains a sample that better represents overall table data).
RUNSTATS with system page-level sampling
System page-level sampling is similar to row-level sampling, except that the pages are sampled and not rows. Each page is selected with a probability of P/100 and rejected with a probability of 1 - P/100. For each page selected all of the rows are selected.The benefit of system page-level sampling over a full table scan or Bernoulli sampling is the savings in I/O.
The sampled pages are also prefetched, so this method would be faster than row-level Bernoulli sampling. Compared with no sampling, page-level sampling significantly improves performance.
The RUNSTATS repeatable clause allows the same sample to be generated across RUNSTATS statements a long as the table data has not changed. To specify this option, the user must also supply an integer that will represent the seed to be used for sample generation.
By using the same seed, the same sample can be generated.
In summary, the accuracy of statistics depends on the sampling rate, the data skew, and data clustering for data sampling.
Some examples of RUNSTATS using Bernoulli row-level and system page-level sampling are as follows:
Example 29. Collect statistics, including distribution statistics on 10 percent of the rows
RUNSTATS ON TABLE db2admin.department WITH DISTRIBUTION TABLESAMPLE BERNOULLI (10)
Example 30. To control the sample set on which the statistics will be collected and to be able to repeatedly use the same sample set, use the following:
RUNSTATS ON TABLE db2admin.department WITH DISTRIBUTION TABLESAMPLE BERNOULLI (10) REPEATABLE (1024)
Example 31. To collect index statistics as well as table statistics on 10 percent of the data pages. Note that only the table data pages and not the index pages are sampled. In this example, 10 percent of table data pages are used for the collection
of table statistics, while for index statistics all of the index pages will be used.
RUNSTATS ON TABLE db2admin.department AND INDEXES ALL TABLESAMPLE SYSTEM (10)
http://www.ibm.com/developerworks/cn/data/library/techarticle/dm-1211wangxy/
相关文章推荐
- Android Studio 打包apk,自动追加版本号和版本名称
- C语言之字符串处理函数
- 国庆遐想:漫步云计算数据中心
- 深入理解PHP内核(三)概览-SAPI概述
- 前向型神经网络之BPNN(附源码)
- Backbone与Angular的比较
- 在images.xcassets中存放图片的优势
- URI、URL和URN的区别
- uva 10304 Optimal Binary Search Tree
- c++ study diary5
- PCLint与VS集成
- 一天 第二章 纤腰舞困因谁,知谁系斑骓?
- iOS版本号说明 Version和Build的区别
- [jQuery] Cannot read property ‘msie’ of undefined错误的解决方法
- ActiveMQ静态发现集群
- Linux系统下常用命令操作
- 微信模板信息发送给用户(JAVA)
- 88.MJRefresh使用中的注意事项 MJRefresh引起的崩溃问题
- iOS CATextLayer 富文本
- C语言 百炼成钢13