Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2 Preliminaries and Existing Techniques<br />
As preliminaries, in this chapter, we survey existing techniques in order to give a comprehensive<br />
overview <strong>of</strong> the state-<strong>of</strong>-the-art <strong>of</strong> optimizing integration flows. We start with a<br />
classification <strong>of</strong> integration approaches. Subsequently, we generalize the system architecture<br />
<strong>of</strong> typical integration platforms and review the modeling, execution and optimization<br />
<strong>of</strong> integration flows. Moreover, we also classify approaches <strong>of</strong> cost-based optimization and<br />
adaptive query processing in different system categories. Furthermore, we introduce the<br />
used notation <strong>of</strong> integration flows including their transactional requirements and we define<br />
the running example integration flows <strong>of</strong> this thesis.<br />
2.1 <strong>Integration</strong> <strong>Flows</strong><br />
<strong>Integration</strong> (from lat. integer = complete) refers to the assembly <strong>of</strong> many parts to a<br />
single composite. In computer science, integration is used in the sense <strong>of</strong> combining local<br />
integration objects (systems, applications, data or functions) with a certain integration<br />
technology. For several reasons, which we will reveal in this section, the integration and<br />
interoperability <strong>of</strong> heterogeneous and distributed components, applications, and systems<br />
is one <strong>of</strong> the broadest and most important research areas in computer science.<br />
Historically, database technology itself was introduced with the goal <strong>of</strong> integrated enterprise<br />
data management in the sense <strong>of</strong> so-called enterprise databases [Bit05]. Unfortunately,<br />
the comprehensive and future-oriented database design over multiple application<br />
programs has not been achieved. As a result, the current situation is that enterprise data<br />
is inherently distributed across many different systems and applications. In this context,<br />
we <strong>of</strong>ten observe the problems <strong>of</strong> (1) non-disjointly distributed data across these systems,<br />
(2) heterogeneous data representations, and (3) data propagation workflows that are implicitly<br />
defined by the applications. Despite the permanent goal <strong>of</strong> homogeneity and the<br />
trend towards homogeneous services and exchange formats, those problems will also remain<br />
in the future due to the diversity <strong>of</strong> application requirements, performance overheads<br />
for ensuring homogeneity (e.g., XML exchange formats), continuous development <strong>of</strong> new<br />
technologies that always cause heterogeneity with regard to legacy technologies, as well<br />
as organizational aspects such as autonomy and privacy. In consequence, data must be<br />
synchronized across those distributed and heterogeneous applications and systems.<br />
This historically reasoned situation <strong>of</strong> enterprise data management by itself leads to<br />
the major goals <strong>of</strong> integration. First, the distributed and heterogeneous data causes the<br />
requirement <strong>of</strong> data consolidation and homogeneity in order to provide a consistent global<br />
view over all operational systems. Second, non-disjointly distributed data and implicit<br />
data propagation workflows reason the need for interoperability in terms <strong>of</strong> interactions<br />
and data synchronization between autonomous systems and applications. Third, in order<br />
to achieve availability (high availability and disaster recovery) as well as performance<br />
and scalability (location-based access, load balancing, and virtualization) technically distributed<br />
subsystems must be integrated and synchronized as well.<br />
5