25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2 Prerequisites for <strong>Cost</strong>-<strong>Based</strong> <strong>Optimization</strong><br />

In [LRD06], semantic constraints are modeled locally for each individual plan by the<br />

user in order to preserve semantic correctness by using application knowledge. There,<br />

constraints between operators are explicitly specified in order to exclude these operators<br />

from any plan rewriting. In contrast to this approach, we define semantic correctness <strong>of</strong><br />

plans with global constraints—that are independent <strong>of</strong> specific plans and thus, reduce the<br />

required modeling and configuration efforts for a user—as follows:<br />

Definition 3.1 (Semantic Correctness). Let P denote an original plan and let P ′ denote<br />

a plan that was created by rewriting P . Then, semantic correctness <strong>of</strong> P ′ refers to<br />

the semantic equivalence <strong>of</strong> P ≡ P ′ . This equivalence property is given if the following<br />

constraints hold:<br />

1. There are no dependencies δ between operators <strong>of</strong> concurrent subflows (parallel subflows<br />

<strong>of</strong> a Fork operator).<br />

2. If there is a dependency δ between an interaction-oriented operator and another<br />

operator, the temporal order <strong>of</strong> them must be equivalent in P and P ′ .<br />

3. If there are two interaction-oriented operators, where at least one performs a write<br />

operation and both refer to the same external system, the temporal order <strong>of</strong> them<br />

must be equivalent in P and P ′ .<br />

4. If there exists an anti-dependency between two operators, the temporal order <strong>of</strong> these<br />

operators must be equivalent in P and P ′ .<br />

5. If there is a data dependency between two operators, the sequential order <strong>of</strong> these<br />

operators must be equivalent in P and P ′ or the applied optimization technique must<br />

guarantee semantic correctness (equivalent results) <strong>of</strong> the changed sequential order.<br />

According to Rule 5, the specific optimization techniques decide whether or not operators,<br />

with dependencies between these operators, can be reordered. This is necessary<br />

because the reordering decision must be made based on the concrete involved operators<br />

and their parameterizations. For example, two Selection operators can be reordered,<br />

while this is impossible for a sequence <strong>of</strong> Selection and Projection operators if the selection<br />

attribute is removed by the projection. In case there was a dependency δ between<br />

two operators and if their sequential order (and thus, also the temporal order) was changed<br />

when rewriting P to P ′ , the parameters <strong>of</strong> the two operators (and hence, the new data<br />

flow) must be changed accordingly. Thus, when rewriting a plan, incremental maintenance<br />

(transformation) <strong>of</strong> the dependency graph is applied as well. Due to the importance <strong>of</strong><br />

this dependency graph, we use Example 3.1 to illustrate its core concepts.<br />

Example 3.1 (Dependency Graph). We use the plan P 3 from Example 2.6. Figure 3.2(a)<br />

shows the related dependency graph. Consider the dependency δ D msg 3<br />

. It is a data dependency<br />

(D) over the message msg 3 from o 4 to o 3 ; i.e., operator o 4 reads the result <strong>of</strong> o 3<br />

as one <strong>of</strong> its join operands. Hence, o 4 depends on o 3 . The dependency graph is used to<br />

determine rewriting possibilities. For example, since there are no dependencies between<br />

operators o 2 and o 3 and none <strong>of</strong> them is a writing interaction, we can insert a Fork operator<br />

and execute those as parallel subflows (Figure 3.2(b)). If there are data dependencies<br />

between local operators (no interaction-oriented operators), we can yet exchange their sequential<br />

order (e.g., o 4 and o 5 by the optimization technique eager group-by). However,<br />

we are not allowed to exchange o 6 and o 7 because this would change the external behavior.<br />

Further, the output and anti dependencies determine that we are not allowed to exchange<br />

the order <strong>of</strong> the involved operators (e.g., o 3 and o 6 ).<br />

37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!