Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong><br />
<strong>Flows</strong><br />
In this chapter, we introduce the fundamentals <strong>of</strong> a novel optimization framework for<br />
integration flows [BHW + 07, BHLW08f, BHLW08g, BHLW09a] in order to enable arbitrary<br />
cost-based optimization techniques. This framework is tailor-made for integration<br />
flows with control-flow execution semantics because it exploits the major integration-flowspecific<br />
characteristic <strong>of</strong> being deployed once and executed many times. Furthermore, it<br />
tackles the specific problems <strong>of</strong> missing statistics, changing workload characteristics, and<br />
imperative flow specifications, while ensuring the required transactional properties as well.<br />
The core idea <strong>of</strong> the overall cost-based optimization framework for integration flows is<br />
incremental statistics maintenance in combination with asynchronous, inter-instance plan<br />
re-optimization. In order to take into account both, data-flow- and control-flow-oriented<br />
operators, we specify the necessary dependency analysis as well as a novel hybrid cost<br />
model. Furthermore, we introduce the periodical re-optimization that includes the core<br />
transformation-based optimization algorithm as well as specific approaches for search space<br />
reduction (such as a join reordering heuristic), influencing workload adaptation sensibility,<br />
and handling <strong>of</strong> correlated data. Subsequently, we present selected concrete optimization<br />
techniques (such as the reordering/merging <strong>of</strong> switch paths, early selection application,<br />
or the rewriting <strong>of</strong> sequences and iterations to parallel flows) to illustrate the rewriting<br />
<strong>of</strong> plans. Finally, the evaluation shows that significant performance improvements are<br />
possible with fairly low optimization overhead.<br />
3.1 Motivation and Problem Description<br />
The motivation for designing a tailor-made optimization approach for integration flows<br />
is the specific characteristic <strong>of</strong> being deployed once and executed many times that can<br />
be exploited for efficient re-optimization. Moreover, integration flows are specified with<br />
control-flow semantics (imperative) in order to enable the execution <strong>of</strong> complex procedural<br />
integration tasks.<br />
Problem 3.1 (Imperative <strong>Integration</strong> <strong>Flows</strong>). When rewriting imperative flow specifications,<br />
the data flow and the control flow (in the sense <strong>of</strong> restrictive temporal dependencies)<br />
must be taken into account in order to ensure semantic correctness. Here, semantic correctness<br />
is used in the sense <strong>of</strong> preventing the external behavior (data aspects and temporal<br />
order) from being changed.<br />
The majority <strong>of</strong> existing flow optimization approaches [LZ05, VSS + 07, BJ10, BABO + 09]<br />
apply rule-based optimizations only (optimize-once) using, for example, algebraic equivalences.<br />
There, rewriting decisions are statically made only once during the initial deployment<br />
<strong>of</strong> an integration flow. However, this is inadequate due to the following problem:<br />
33