您的位置：首页 > 其它

DB2 Runstats 抽样统计 sampled

2016-01-25 14:00 387 查看

CALL SYSPROC.ADMIN_CMD ('RUNSTATS ON TABLE wh.employee WITH DISTRIBUTION AND DETAILED INDEXES ALL TABLESAMPLE BERNOULLI(15)')

runstats on table wh.employee with distribution on all columns

and detailed indexes all ;

runstats on table wh.employee with distribution on all columns

and sampled detailed indexes all ;

SYSTEM( 系统页级抽样，默认）和 BERNOULLI( 行级抽样）。

runstats on table wh.employee with distribution on all columns

and detailed indexes all tablesample system (10); ----- 10 percent of 抽样记录页

runstats on table wh.employee with distribution on all columns

and sampled detailed indexes all tablesample bernoulli (15); ------ 15 percent of 抽样记录行

RUNSTATS with row-level Bernoulli sampling

Row-level Bernoulli sampling gets a sample of P percent of the table rows by means of a sargable (search + argument-able predicate is a predicate that can be evaluated by the Data Manager) predicate that includes every row in the sample with the probability
of P/100 and excludes it with a probability of 1 - P/100.

For example, for 10% Bernoulli sampling, (10/100) 10% of the rows will be selected and (1-10/100) 90% of the rows will be rejected. In Bernoulli sampling, an I/O will be incurred for each page since the table will be scanned. A random number will be generated
to determine if a row will be selected or not (similar to flipping a coin with a probability of P/100). Even if the I/O happens for every page, we still save on CPU time required to process the data.

With row-level Bernoulli sampling, every data page is read. However, it can still produce significant performance improvement because RUNSTATS is CPU intensive. If indexes are available, then the sampling is improved. It may also provide more accurate statistics
if the data is clustered (obtains a sample that better represents overall table data).

RUNSTATS with system page-level sampling

System page-level sampling is similar to row-level sampling, except that the pages are sampled and not rows. Each page is selected with a probability of P/100 and rejected with a probability of 1 - P/100. For each page selected all of the rows are selected.
The benefit of system page-level sampling over a full table scan or Bernoulli sampling is the savings in I/O.

The sampled pages are also prefetched, so this method would be faster than row-level Bernoulli sampling. Compared with no sampling, page-level sampling significantly improves performance.

The RUNSTATS repeatable clause allows the same sample to be generated across RUNSTATS statements a long as the table data has not changed. To specify this option, the user must also supply an integer that will represent the seed to be used for sample generation.
By using the same seed, the same sample can be generated.

In summary, the accuracy of statistics depends on the sampling rate, the data skew, and data clustering for data sampling.

Some examples of RUNSTATS using Bernoulli row-level and system page-level sampling are as follows:

Example 29. Collect statistics, including distribution statistics on 10 percent of the rows

RUNSTATS ON TABLE db2admin.department WITH DISTRIBUTION TABLESAMPLE BERNOULLI (10)

Example 30. To control the sample set on which the statistics will be collected and to be able to repeatedly use the same sample set, use the following:

RUNSTATS ON TABLE db2admin.department WITH DISTRIBUTION TABLESAMPLE
BERNOULLI (10) REPEATABLE (1024)

Example 31. To collect index statistics as well as table statistics on 10 percent of the data pages. Note that only the table data pages and not the index pages are sampled. In this example, 10 percent of table data pages are used for the collection
of table statistics, while for index statistics all of the index pages will be used.

RUNSTATS ON TABLE db2admin.department AND INDEXES ALL TABLESAMPLE SYSTEM (10)

http://www.ibm.com/developerworks/cn/data/library/techarticle/dm-1211wangxy/

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航