07.02.2013 Views

Best Practices for SAP BI using DB2 9 for z/OS - IBM Redbooks

Best Practices for SAP BI using DB2 9 for z/OS - IBM Redbooks

Best Practices for SAP BI using DB2 9 for z/OS - IBM Redbooks

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Often the single-value based frequency statistics can hardly help <strong>DB2</strong> with the<br />

predicate selectivity other than uni<strong>for</strong>m interpolation on the rest of the<br />

(uncollected) value range. This would be a wild guess and may lead to an<br />

undesirable access path.<br />

Histogram describes the data distribution over the entire value range. The<br />

predicate selectivity gets a more accurate calculation if the searching range<br />

matches the boundary of any one quantile or any group of consecutive quantiles.<br />

Even if there is no perfect match, the predicate selectivity interpolation now is<br />

done within one or two particular quantiles. With the interpolation done in a much<br />

smaller granularity, the predicate selectivity is expected to be evaluated with<br />

more accuracy.<br />

Histograms in detail<br />

Histogram statistics are introduced to <strong>DB2</strong> 9 <strong>for</strong> enhancing predicate selectivity<br />

estimation and to enhance the <strong>DB2</strong> access path selection in general.<br />

Histogram statistics is a way of summarizing data distribution on an interval scale<br />

(either discrete or continuous). It divides up the range of possible values in a<br />

dataset into quantiles, <strong>for</strong> which a set of statistics parameters is collected.<br />

There are several types of histogram statistics being researched. <strong>DB2</strong> <strong>for</strong> z/<strong>OS</strong><br />

uses only Equal-depth histogram statistics.<br />

Histogram statistics enable <strong>DB2</strong> to improve access path selection by estimating<br />

predicate selectivity from value-distribution statistics that are collected over the<br />

entire range of values in a data set.<br />

<strong>DB2</strong> chooses the best access path <strong>for</strong> a query based on predicate selectivity<br />

estimation, which in turn relies heavily on data distribution statistics. Histogram<br />

statistics summarize data distribution on an interval scale by dividing the entire<br />

range of possible values within a data set into a number of intervals.<br />

<strong>DB2</strong> creates equal-depth histogram statistics, meaning that it divides the whole<br />

range of values into intervals that each contain about the same percentage of the<br />

total number rows. The following columns in a histogram statistics table define an<br />

interval:<br />

QUANTILENO An ordinary sequence number that identifies the interval<br />

LOWVALUE A value that serves as the lower bound <strong>for</strong> the interval<br />

HIGHVALUE The value that serves as the upper bound <strong>for</strong> the interval<br />

Note the following characteristics of histogram statistics intervals:<br />

► Each interval includes approximately the same number, or percentage, of the<br />

rows. A highly frequent single value might occupy an interval by itself.<br />

46 <strong>Best</strong> <strong>Practices</strong> <strong>for</strong> <strong>SAP</strong> <strong>BI</strong> <strong>using</strong> <strong>DB2</strong> 9 <strong>for</strong> z/<strong>OS</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!