25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.1 <strong>Integration</strong> <strong>Flows</strong><br />

to automatically specify the semantics <strong>of</strong> ETL jobs in a platform-independent manner.<br />

The Orchid project is restricted to the generation <strong>of</strong> ETL flows. In addition, Jörg and<br />

Deßloch addressed the generation <strong>of</strong> incremental ETL jobs [JD08, JD09] and subsequently,<br />

they also investigated the optimization <strong>of</strong> these incremental ETL jobs [BJ10].<br />

In contrast, the METL project by Albrecht and Naumann focuses on the combination <strong>of</strong><br />

the generation and model management <strong>of</strong> ETL flows [AN08]. There, platform-independent<br />

and tool-independent operators for model management were introduced. In contrast to<br />

generic model management [BM07] for schema mappings, the order, type, and configuration<br />

<strong>of</strong> data transformation steps must be taken into account. These requirements are<br />

addressed with specific high-level operators and an ETL management platform [AN09].<br />

In addition to these ETL flow approaches, our GCIP framework (Generation <strong>of</strong> Complex<br />

<strong>Integration</strong> Processes) [BHLW09a] addresses the modeling <strong>of</strong> platform-independent<br />

integration flows [BHLW08e], the flow generation for arbitrary integration platforms (e.g.,<br />

FDBMS, EAI, ETL) as well as the application <strong>of</strong> optimization techniques [BHLW08f,<br />

BHLW08g, BBH + 08a, BBH + 08b] during model-driven generation and finally, the deployment<br />

<strong>of</strong> these generated integration flows [BHLW09b]. Therefore, a hierarchy <strong>of</strong> platformindependent,<br />

platform-specific, and tool-specific models is used.<br />

While all <strong>of</strong> these approaches address at most two aspects <strong>of</strong> integration flows, namely<br />

the flow specification and schema definitions, Mazon et al. defined the multi-dimensional<br />

model-driven architecture for the development <strong>of</strong> data warehouses [MTSP05]. There, the<br />

aspects (1) data sources, (2) ETL flows, (3) multi-dimensional data warehouse design, (4)<br />

customization code, and (5) application code as well as their inter-influences are taken<br />

into account. This general framework was recently extended by data merging and data<br />

customization aspects for data mining [KZOC09].<br />

D. Declarative <strong>Integration</strong> Flow Modeling<br />

In contrast to the usually used imperative integration flows that are modeled in a prescriptive<br />

manner, the Demaq project [BMK08] models declarative integration flows in<br />

a descriptive manner by using dependable XML message queue definitions (basic, timebased,<br />

gateway), where the processing logic is described in terms <strong>of</strong> a declarative rule<br />

language. As a foundation, a declarative Queue Definition Language (QDL) and a Queue<br />

Manipulation Language (QML) based on the XQuery update facility are used [BKM07].<br />

Furthermore, Demaq introduced the concept <strong>of</strong> slices that are specifications <strong>of</strong> logical<br />

message groups in the form <strong>of</strong> virtual queues. Despite the concept <strong>of</strong> dependable and<br />

time-based queues, complex procedural aspects, i.e., temporal dependencies, and complex<br />

control-flows are hard to model. However, for message-centric integration flows, reasonable<br />

performance, scalability and transactional properties are achieved [BK09].<br />

While Demaq describes the overall flow with dependencies between queues, the Orchid<br />

project [DHW + 08] uses declarative schema mappings between source and target schemes<br />

and generates imperative ETL flows. To this extent, Orchid also uses declarative modeling<br />

semantics despite the resulting imperative ETL flow specifications.<br />

Finally, the modeled integration flows are deployed into the execution environment<br />

using pre-defined exchange formats (e.g., XML). From this point in time on, the deployed<br />

integration flows are identified by logical names and executed many times. However, there<br />

are several side-effects from modeling to execution.<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!