25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.1 Motivation and Problem Description<br />

they do not consider any data-intensive operators (such as Join, Groupby, etc) and related<br />

optimization techniques as well as they are typically not aware <strong>of</strong> interactions with external<br />

systems. Second, existing cost-based optimizers for DBMS or DSMS optimize data-flow<br />

graphs rather than imperative control-flow graphs. Similarly to existing approaches <strong>of</strong><br />

integration flow optimization, both categories use the optimize-once or optimize-always<br />

models and thus, do not take into account the major characteristic <strong>of</strong> integration flows in<br />

the form <strong>of</strong> being deployed once and executed many times.<br />

We use the introduced system architecture <strong>of</strong> typical integration platforms (see Subsection<br />

2.1.2) in order to sketch the required architectural extensions for enabling the<br />

cost-based optimization <strong>of</strong> integration flows. Figure 3.1 illustrates this extended reference<br />

system architecture including the novel cost-based optimization component.<br />

Modeling<br />

Analyze<br />

<strong>Optimization</strong><br />

Flow<br />

Designer<br />

Flow<br />

Execution<br />

Statistics<br />

Monitor<br />

<strong>Cost</strong>-<strong>Based</strong><br />

Optimizer<br />

Plan<br />

Optimizer<br />

Rule-<strong>Based</strong><br />

Optimizer<br />

External<br />

System<br />

External<br />

System<br />

External<br />

System<br />

Inbound<br />

Adapter 1<br />

...<br />

Inbound<br />

Adapter n<br />

sync<br />

async<br />

Execute<br />

Plan<br />

Process Engine<br />

Scheduler<br />

Outbound<br />

Adapter 1<br />

...<br />

...<br />

Outbound<br />

Adapter k<br />

External<br />

System<br />

External<br />

System<br />

External<br />

System<br />

External<br />

System<br />

Temporary Datastore<br />

Execution<br />

Figure 3.1: Extended Reference System Architecture<br />

In order to address the problem <strong>of</strong> imperative integration flows, the deployment process<br />

is modified such that a dependency analysis <strong>of</strong> operators is executed during the initial<br />

deployment. Furthermore, the rule-based optimization, where we do not require any<br />

execution statistics, is executed once during this deployment as well. We described several<br />

rule-based optimization techniques for integration flows [BHW + 07], but in this thesis,<br />

we omit the details for the sake <strong>of</strong> clarity <strong>of</strong> presentation. From this point in time, the<br />

plan is executed many times and the optimizer is used to continuously adapt the current<br />

plan to changing workload characteristics. The goal <strong>of</strong> plan optimization is to rewrite<br />

(transform) a given plan into a semantically equivalent plan that is optimal in the average<br />

case with regard to the estimated costs. Therefore, we use a feedback loop according to<br />

the general MAPE (Monitor, Analyze, Plan, Execute) concept [IBM05a]. During plan<br />

execution, statistics are gathered (Monitor). Then, the optimizer periodically analyzes<br />

the given workload (Analyze) in order to optimize the current plan. Subsequently, the<br />

rewritten plan is deployed (Plan) and used for execution (Execute).<br />

We use this general feedback loop as the overall structure <strong>of</strong> this chapter. First, we<br />

present a self-adjusting cost model for integration flows and explain the cost estimation <strong>of</strong><br />

plans, using this cost model (Monitor → Analyze; Subsection 3.2.2). Second, we illustrate<br />

the optimization problem as well as the overall periodical re-optimization algorithm (Analyze<br />

→ Plan; Section 3.3). Third, we demonstrate the rewriting <strong>of</strong> plans using selected<br />

35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!