25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong><br />

<strong>Flows</strong><br />

In this chapter, we introduce the fundamentals <strong>of</strong> a novel optimization framework for<br />

integration flows [BHW + 07, BHLW08f, BHLW08g, BHLW09a] in order to enable arbitrary<br />

cost-based optimization techniques. This framework is tailor-made for integration<br />

flows with control-flow execution semantics because it exploits the major integration-flowspecific<br />

characteristic <strong>of</strong> being deployed once and executed many times. Furthermore, it<br />

tackles the specific problems <strong>of</strong> missing statistics, changing workload characteristics, and<br />

imperative flow specifications, while ensuring the required transactional properties as well.<br />

The core idea <strong>of</strong> the overall cost-based optimization framework for integration flows is<br />

incremental statistics maintenance in combination with asynchronous, inter-instance plan<br />

re-optimization. In order to take into account both, data-flow- and control-flow-oriented<br />

operators, we specify the necessary dependency analysis as well as a novel hybrid cost<br />

model. Furthermore, we introduce the periodical re-optimization that includes the core<br />

transformation-based optimization algorithm as well as specific approaches for search space<br />

reduction (such as a join reordering heuristic), influencing workload adaptation sensibility,<br />

and handling <strong>of</strong> correlated data. Subsequently, we present selected concrete optimization<br />

techniques (such as the reordering/merging <strong>of</strong> switch paths, early selection application,<br />

or the rewriting <strong>of</strong> sequences and iterations to parallel flows) to illustrate the rewriting<br />

<strong>of</strong> plans. Finally, the evaluation shows that significant performance improvements are<br />

possible with fairly low optimization overhead.<br />

3.1 Motivation and Problem Description<br />

The motivation for designing a tailor-made optimization approach for integration flows<br />

is the specific characteristic <strong>of</strong> being deployed once and executed many times that can<br />

be exploited for efficient re-optimization. Moreover, integration flows are specified with<br />

control-flow semantics (imperative) in order to enable the execution <strong>of</strong> complex procedural<br />

integration tasks.<br />

Problem 3.1 (Imperative <strong>Integration</strong> <strong>Flows</strong>). When rewriting imperative flow specifications,<br />

the data flow and the control flow (in the sense <strong>of</strong> restrictive temporal dependencies)<br />

must be taken into account in order to ensure semantic correctness. Here, semantic correctness<br />

is used in the sense <strong>of</strong> preventing the external behavior (data aspects and temporal<br />

order) from being changed.<br />

The majority <strong>of</strong> existing flow optimization approaches [LZ05, VSS + 07, BJ10, BABO + 09]<br />

apply rule-based optimizations only (optimize-once) using, for example, algebraic equivalences.<br />

There, rewriting decisions are statically made only once during the initial deployment<br />

<strong>of</strong> an integration flow. However, this is inadequate due to the following problem:<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!