25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

purpose, the path probabilities P (path i ) (as relative frequencies over the sliding window)<br />

and the absolute costs for evaluation <strong>of</strong> a path expression W (expr pathi ) are needed in order<br />

to compute the relative costs for accessing a Switch path with W (expr pathi )/P (path i ).<br />

As the core concept <strong>of</strong> WD1, we reorder Switch path expressions according to their<br />

relative costs for expression evaluation. When applying this technique we need to ensure<br />

the semantic correctness, where the structure <strong>of</strong> an expression is assumed to be a set <strong>of</strong><br />

predicates attribute θ value. We define that only independent expressions (e.g., annotated<br />

within the flow specification) can be reordered, while conditional expressions prevent any<br />

reordering. This reordering <strong>of</strong> Switch paths is optimal (in the average case) if Switch<br />

paths are sorted in ascending order <strong>of</strong> their relative costs, such that the following optimality<br />

condition holds:<br />

W (expr pathi )<br />

P (path i )<br />

With such a reordering, an execution time reduction <strong>of</strong><br />

≤ W ( )<br />

expr pathi+1<br />

. (3.19)<br />

P (path i+1 )<br />

∆W (path i , path i+1 ) = P (path i ) · W ( expr pathi+1<br />

)<br />

− P (pathi+1 ) · W (expr pathi ) (3.20)<br />

is possible when reordering two paths path i+1 and path i .<br />

In contrast to the reordering <strong>of</strong> independent expressions, for any expressions that refer<br />

to the same attribute, the technique WD2 can be applied. There, the concept is to<br />

merge expressions with equivalent attribute to a compound switch path in order to extract<br />

the single value only once and to evaluate it multiple times. With such a merged path<br />

evaluation, an execution time reduction <strong>of</strong><br />

∆W (path i , path i+1 ) = P (path i+1 ) · W ( )<br />

expr pathi+1 (3.21)<br />

can be achieved. The compound path can be reordered similar to normal Switch paths.<br />

In consequence, the technique WD2 should be applied before WD1.<br />

The following rewriting algorithm applies the reordering and merging <strong>of</strong> Switch paths.<br />

First, we partition the expressions, according to the attributes (e.g., represented by<br />

XPath expressions). If a partition contains multiple paths, we apply the merging by<br />

replacing the two paths with one compound path that writes the extracted attribute<br />

value to an operator-local cache and evaluates it multiple times. Therefore, all subpaths<br />

<strong>of</strong> the compound path are annotated as compound. Second, we compute the relative<br />

costs W (expr pathi )/P (path i ) for each path and reorder the path according to the<br />

given optimality condition. In total, this rewriting algorithm exhibits a complexity <strong>of</strong><br />

O(m 2 ) = O(m 2 + m · log m) due to partitioning and sorting <strong>of</strong> Switch paths. We use an<br />

example to illustrate the resulting execution time when using these techniques.<br />

Example 3.12 (Reordering and Merging Switch Paths). Recall our example plan P 1 that<br />

is shown in Figure 3.15(a). Further, assume that the costs for accessing each <strong>of</strong> the two<br />

Switch paths has been monitored as W (expr) = 30 ms. We analytically investigate the<br />

influence <strong>of</strong> varying path probabilities P (A) ∈ [0, 1] with P (A) + P (B) = 1. Figure 3.15(b)<br />

illustrates the influence <strong>of</strong> reordering the switch paths (assuming independent expressions,<br />

e.g., A : var1 = x and B : var2 = y), where the costs are computed by P (A) · W (expr) +<br />

(P (A) · W (expr) + P (B) · W (expr)) due to the ordered if-elseif semantic. <strong>Based</strong> on the<br />

equivalence <strong>of</strong> costs for evaluating the expressions, we benefit from reordering if P (A) <<br />

P (B). This means, for P (A) < 0.5, we reorder (A, B) to (B, A) and thus, achieve the<br />

shown benefits. Furthermore, Figure 3.15(c) illustrates the influence <strong>of</strong> merging switch<br />

66

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!