19.06.2013 Views

DB2 UDB for z/OS Version 8 Performance Topics - IBM Redbooks

DB2 UDB for z/OS Version 8 Performance Topics - IBM Redbooks

DB2 UDB for z/OS Version 8 Performance Topics - IBM Redbooks

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Star join technical issues<br />

The technical issues surrounding the optimization of decision support and data warehousing<br />

queries against a star schema model can be broken down into bind-time and run-time issues.<br />

The first consideration generally is the sheer size of the fact table, and also the large number<br />

of tables that can be represented in the star schema.<br />

► Bind-time issues<br />

For non-star join queries, the optimizer uses the pair-wise join method to determine the<br />

join permutation order. However, the number of join permutations grows exponentially with<br />

the number of tables being joined. This results in an increase in bind time, because of the<br />

time required to evaluate the possible join permutations. Star join does not do pair-wise<br />

join. It joins several unrelated tables via a (simulated) cartesian join to the fact table.<br />

► Run-time issues<br />

Star join queries are also challenging at run-time. <strong>DB2</strong> uses either cartesian joins of<br />

dimension tables (not based on join predicates, the result is all possible combinations of<br />

values in both tables) or pair-wise joins (based on join predicates).<br />

The result is all possible combinations of values in both tables. The pair-wise joins have<br />

worked quite effectively in the OLTP world. However, in the case of star join, since the only<br />

table directly related to other tables is the fact table, the fact table is most likely chosen in<br />

the pair-wise join, with subsequent dimension tables joined based on cost.<br />

Even though the intersection of all dimensions with the fact table can produce a small<br />

result, the predicates supplied to a single dimension table are typically insufficient to<br />

reduce the enormous number of fact table rows. For example, a single dimension join to<br />

the fact table may find:<br />

– 100+ million sales transactions in the month of December<br />

– 10+ million sales in San Jose stores<br />

– 10+ million sales of Jeans, but<br />

– Only thousands of rows that match all three criteria<br />

Hence there is no one single dimension table which could be paired with the fact table as<br />

the first join pair to produce a manageable result set.<br />

With these difficulties in mind, the criteria <strong>for</strong> star join can be outlined as follows:<br />

► “Selectively” join from the outside-in:<br />

Purely doing a cartesian join of all dimension tables be<strong>for</strong>e accessing the fact table may<br />

not be efficient if the dimension tables do not have filtering predicates applied, or there is<br />

no available index on the fact table to support all dimensions. The optimizer should be able<br />

to determine which dimension tables should be accessed be<strong>for</strong>e the fact table to provide<br />

the greatest level of filtering of fact table rows.<br />

For this reason, outside-in processing is also called the filtering phase. <strong>DB2</strong> V8 enhances<br />

the algorithms to do a better job at selecting which tables to join during outside-in<br />

processing.<br />

► Efficient “Cartesian join” from the outside-in:<br />

A physical cartesian join generates a large number of resultant rows based on the cross<br />

product of the unrelated dimensions. A more efficient cartesian type process is required<br />

as the number and size of the dimension tables increase, to avoid an exponential growth in<br />

storage requirements. A key feedback technique is useful <strong>for</strong> making cartesian joins<br />

efficient.<br />

► Efficient “join back”, inside-out:<br />

The join back to dimension tables that are accessed after the fact table must also be<br />

efficient. Non-indexed or materialized dimensions present a challenge <strong>for</strong> excessive sort<br />

Chapter 3. SQL per<strong>for</strong>mance 55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!