25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Preliminaries and Existing Techniques<br />

As preliminaries, in this chapter, we survey existing techniques in order to give a comprehensive<br />

overview <strong>of</strong> the state-<strong>of</strong>-the-art <strong>of</strong> optimizing integration flows. We start with a<br />

classification <strong>of</strong> integration approaches. Subsequently, we generalize the system architecture<br />

<strong>of</strong> typical integration platforms and review the modeling, execution and optimization<br />

<strong>of</strong> integration flows. Moreover, we also classify approaches <strong>of</strong> cost-based optimization and<br />

adaptive query processing in different system categories. Furthermore, we introduce the<br />

used notation <strong>of</strong> integration flows including their transactional requirements and we define<br />

the running example integration flows <strong>of</strong> this thesis.<br />

2.1 <strong>Integration</strong> <strong>Flows</strong><br />

<strong>Integration</strong> (from lat. integer = complete) refers to the assembly <strong>of</strong> many parts to a<br />

single composite. In computer science, integration is used in the sense <strong>of</strong> combining local<br />

integration objects (systems, applications, data or functions) with a certain integration<br />

technology. For several reasons, which we will reveal in this section, the integration and<br />

interoperability <strong>of</strong> heterogeneous and distributed components, applications, and systems<br />

is one <strong>of</strong> the broadest and most important research areas in computer science.<br />

Historically, database technology itself was introduced with the goal <strong>of</strong> integrated enterprise<br />

data management in the sense <strong>of</strong> so-called enterprise databases [Bit05]. Unfortunately,<br />

the comprehensive and future-oriented database design over multiple application<br />

programs has not been achieved. As a result, the current situation is that enterprise data<br />

is inherently distributed across many different systems and applications. In this context,<br />

we <strong>of</strong>ten observe the problems <strong>of</strong> (1) non-disjointly distributed data across these systems,<br />

(2) heterogeneous data representations, and (3) data propagation workflows that are implicitly<br />

defined by the applications. Despite the permanent goal <strong>of</strong> homogeneity and the<br />

trend towards homogeneous services and exchange formats, those problems will also remain<br />

in the future due to the diversity <strong>of</strong> application requirements, performance overheads<br />

for ensuring homogeneity (e.g., XML exchange formats), continuous development <strong>of</strong> new<br />

technologies that always cause heterogeneity with regard to legacy technologies, as well<br />

as organizational aspects such as autonomy and privacy. In consequence, data must be<br />

synchronized across those distributed and heterogeneous applications and systems.<br />

This historically reasoned situation <strong>of</strong> enterprise data management by itself leads to<br />

the major goals <strong>of</strong> integration. First, the distributed and heterogeneous data causes the<br />

requirement <strong>of</strong> data consolidation and homogeneity in order to provide a consistent global<br />

view over all operational systems. Second, non-disjointly distributed data and implicit<br />

data propagation workflows reason the need for interoperability in terms <strong>of</strong> interactions<br />

and data synchronization between autonomous systems and applications. Third, in order<br />

to achieve availability (high availability and disaster recovery) as well as performance<br />

and scalability (location-based access, load balancing, and virtualization) technically distributed<br />

subsystems must be integrated and synchronized as well.<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!