25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

tegration flows. In detail, we make the following more concrete contributions that also<br />

reflect the structure <strong>of</strong> this thesis.<br />

• As a prerequisite, Chapter 2 analyzes existing techniques and introduces the notation<br />

and examples used throughout this thesis. The literature survey is tw<strong>of</strong>old. On<br />

the one side, we review common integration approaches and, more specifically, we<br />

discuss the modeling, execution and optimization <strong>of</strong> integration flows. On the other<br />

side, we discuss adaptive query processing techniques to illustrate the state-<strong>of</strong>-theart<br />

<strong>of</strong> cost-based optimization in other system categories such as DBMS (Database<br />

Management Systems) or DSMS (Data Stream Management Systems).<br />

• In Chapter 3, we explain the novel fundamentals for the cost-based optimization <strong>of</strong><br />

integration flows that enable arbitrary optimization techniques. The specific characteristics<br />

<strong>of</strong> integration flows in terms <strong>of</strong> missing statistics, changing workload characteristics,<br />

imperative flow specification (prescriptive) and transactional properties<br />

have led to the need for fundamentally new concepts. In order to take into account<br />

both data-flow- and control-flow-oriented operators, we explain the necessary dependency<br />

analysis as well as a hybrid cost model. We introduce the inter-instance,<br />

periodical re-optimization that includes the core transformation-based optimization<br />

algorithm as well as specific approaches for search space reduction, workload adaptation<br />

sensibility, and the handling <strong>of</strong> correlated data. Finally, we present selected<br />

concrete optimization techniques to illustrate the rewriting <strong>of</strong> flows.<br />

• Subsequently, in Chapter 4, we present the cost-based vectorization <strong>of</strong> integration<br />

flows that is a tailor-made, control-flow-oriented optimization technique. <strong>Based</strong> on<br />

the problems <strong>of</strong> low resource utilization and specific transactional requirements, this<br />

technique computes the optimal grouping <strong>of</strong> operators to multi-threaded execution<br />

buckets in order to achieve the optimal degree <strong>of</strong> pipeline parallelism with a minimal<br />

number <strong>of</strong> buckets and hence, it maximizes message throughput. We present<br />

context-specific rewriting techniques to ensure transactional properties and discuss<br />

exhaustive and heuristic computation approaches for certain settings and constraints.<br />

• As a tailor-made data-flow-oriented optimization technique, we introduce the concept<br />

<strong>of</strong> multi-flow optimization in Chapter 5. It is based on the problem <strong>of</strong> expensive<br />

access <strong>of</strong> external systems. The core idea is to horizontally partition the inbound<br />

message queues and to execute flows for partitions <strong>of</strong> multiple messages. Due to<br />

the decreased number <strong>of</strong> queries to external systems as well as cost reductions for<br />

local operators, this results in throughput improvements. In detail, we introduce the<br />

partition tree that is used as a physical message queue representation, an approach<br />

for creating and incrementally maintaining such partition trees, and related flow<br />

rewriting techniques. Furthermore, we extend the defined cost model and present<br />

an approach to periodically compute the optimal waiting time for collecting messages<br />

in order to achieve the highest throughput, while ensuring maximum latency<br />

constraints <strong>of</strong> individual messages.<br />

• In order to decrease the overhead for statistics monitoring and re-optimization but<br />

to adapt to changing workloads as fast as possible, in Chapter 6, we introduce the<br />

novel concept <strong>of</strong> on-demand re-optimization. This includes (1) to model optimality<br />

<strong>of</strong> a plan by its optimality conditions using a so-called Plan Optimality Tree rather<br />

than considering the complete search space, (2) to monitor only statistics that are<br />

included in these conditions, and (3) to use directed re-optimization if conditions are<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!