Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.1 <strong>Integration</strong> <strong>Flows</strong><br />
complex queries, Publish/Subscribe systems usually execute huge numbers <strong>of</strong> fairly simple<br />
queries. On the other side, there are ETL tools that follow a time-based or data-driven<br />
event model. All three system categories conceptually use a hub-and-spoke or bus topology,<br />
where in both cases a central integration platform is used. Finally, all system types<br />
<strong>of</strong> the application area <strong>of</strong> information integration are query-based in the sense <strong>of</strong> the specification<br />
method for integration task modeling. A special case is given by ETL tools that<br />
<strong>of</strong>ten use integration flows as the specification method as well.<br />
The categories <strong>of</strong> application integration and process integration refer to a more loosely<br />
coupled type <strong>of</strong> integration, where integration flows are used as specification method and<br />
message-oriented flows are hierarchically composed. The main distinction between both<br />
is that application integration refers to the integration <strong>of</strong> heterogeneous systems and applications,<br />
while process integration refers to a business-process-oriented integration <strong>of</strong><br />
homogeneous services (e.g., Web services). Thus, both are classified as materialized integration<br />
approaches because messages are propagated and physically stored by the target<br />
systems. However, application integration is data-centric in terms <strong>of</strong> efficiently exchanging<br />
data between the involved applications, while process integration is more focused on<br />
procedural aspects in the sense <strong>of</strong> controlling the overall business process and its involved<br />
systems. In this context, we see many different facets <strong>of</strong> system types such as (near)<br />
real-time ETL tools (that use the data-driven event model), MOM systems (that use<br />
standard-messaging infrastructures such as Java Message Service), EAI systems, BPEL<br />
Engines (Business Process Execution Language), and Web Service Management Systems<br />
(WSMS). Note that those system categories have converged more and more in the past<br />
[Sto02, HAB + 05] in the form <strong>of</strong> overlapping functionalities [Sto02]. For example, standards<br />
from the area <strong>of</strong> process integration such as BPEL are also partially used to specify<br />
application integration tasks.<br />
Finally, GUI integration (Graphical User Interface) describes the unique and integrated<br />
visualization <strong>of</strong> (or the access to) heterogeneous and distributed data sources. Portals<br />
provide a unique system for accessing heterogeneous data sources and applications, where<br />
data is only integrated for visualization purposes. In contrast, mashups dynamically compose<br />
Web content (feeds, maps, etc.) for creating small applications with a stronger focus<br />
on content integration. See the classification by Aumueller and Thor [AT08] for a detailed<br />
classification <strong>of</strong> existing mashup approaches. Both portals and mashups are classified as<br />
virtual integration approaches that use a hierarchical topology, and the specification is<br />
mainly user-interface-oriented.<br />
The exclusive scope <strong>of</strong> this thesis is the category <strong>of</strong> integration flows. As a result, the<br />
proposed approaches can be applied for process integration, application integration, and<br />
partially information integration as well.<br />
2.1.2 System Architecture for <strong>Integration</strong> <strong>Flows</strong><br />
The integration <strong>of</strong> highly heterogeneous systems and applications that require fairly complex<br />
procedural aspects makes it almost impossible to realize these integration tasks using<br />
traditional techniques from the area <strong>of</strong> distributed query processing [Kos00] or replication<br />
techniques [CCA08, PGVA08]. In consequence, those complex integration tasks are<br />
typically modeled and executed as imperative integration flow specifications.<br />
<strong>Based</strong> on the specific characteristics <strong>of</strong> integration flows, a typical system architecture<br />
has evolved in the past. This architecture is commonly used by the major EAI products<br />
such as SAP eXchange Infrastructure (XI) / Process <strong>Integration</strong> (PI) [SAP10], IBM<br />
7