25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6 On-Demand Re-<strong>Optimization</strong><br />

f γ((γR)⋊⋉S) θ f γ(R⋊⋉S) /f γ(R) . If we assume independence for this group-by selectivity, we<br />

have f γ((γR)⋊⋉S) = 1 and can set f γ(R⋊⋉S) = f γ(R) . As a result, we can derive all variables<br />

<strong>of</strong> the optimality conditions from statistics <strong>of</strong> the optimal plan.<br />

* |R|, |S| as additional input<br />

R S γ<br />

|R| |S| |R S| | γ(R S)|<br />

fR,S<br />

f γ(R ⋈ S)<br />

γ<br />

oc 2 oc 3<br />

oc 4<br />

C1 *C2<br />

*C3<br />

*C4<br />

R<br />

S<br />

≤<br />

(oc1)<br />

≤<br />

(oc2)<br />

≤<br />

(oc3)<br />

≤<br />

(oc4)<br />

oc 1<br />

(a) Example POT<br />

(b) Optimality Conditions<br />

(c) Complexity Analysis<br />

Figure 6.10: Example Eager Group-By<br />

Figure 6.10(a) shows the resulting PlanOptTree, where we omitted some connections<br />

(*) to atomic statistic nodes for simplicity <strong>of</strong> presentation. Note that for eager group-by,<br />

no transitivity is used. Furthermore, only the plan optimality is modeled rather than<br />

the whole plan search space. Hence, only four optimality conditions are required per join<br />

operator as shown in Figure 6.10(b). Accordingly, Figure 6.10(c) compares the number <strong>of</strong><br />

alternative plans <strong>of</strong> the full search space with the number <strong>of</strong> required optimality conditions.<br />

The improvement is reasoned by the fact that for each join input, we just model if preaggregation<br />

is advantageous or not.<br />

Union Distinct Example<br />

In contrast to join enumeration or eager group-by, there are many control-flow- and<br />

data-flow-oriented optimization techniques with fairly simple optimality conditions and<br />

thus, rather small PlanOptTrees. An example is the optimization technique WD11:<br />

Setoperation-Type Selection (set operations with distinctness).<br />

oc 2<br />

U<br />

sort(R)<br />

R S R<br />

oc 1<br />

UM<br />

sort(S)<br />

S<br />

(a) Union Distinct Alternatives<br />

R U S<br />

|R| |S| |R U S|<br />

≤<br />

(oc1)<br />

C1<br />

≥<br />

(oc’2)<br />

C2<br />

(b) Example POT<br />

Figure 6.11: Example Union Distinct<br />

There are three alternative subplans for a union distinct R∪S. First, there is the normal<br />

union distinct operator with costs that are given by C(R ∪ S) = |R| + |S| · |R ∪ S|/2 (two<br />

plans due to asymmetric costs), where |R| ≤ |R ∪ S| ≤ |R| + |S| holds. Second, we can<br />

sort both inputs and apply a merge algorithm with costs <strong>of</strong><br />

C (sort(R) ∪ M sort(S)) = |R| + |S| + |R| · log 2 |R| + |S| · log 2 |S|. (6.7)<br />

184

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!