Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.3 <strong>Integration</strong> Flow Meta Model<br />
access to heterogeneous systems and applications. There, the proprietary external messages<br />
and data representations are transformed into the described internal message meta<br />
model. In detail, the group <strong>of</strong> interaction-oriented operators include the operators shown<br />
in Table 2.1. In contrast to this, the data-flow- and control-flow-oriented operators are<br />
used as local processing steps within the integration platform in the sense that they do<br />
not perform any interactions with external systems. Both groups <strong>of</strong> operators are shown<br />
in Table 2.2 and Table 2.3, respectively.<br />
The instance-based plan execution has several implications for all operators. First, the<br />
operators use materialized intermediate results in the sense <strong>of</strong> local message variables.<br />
Second, the data flow is implicitly given by those input and output variables.<br />
Moreover, we distinguish between external integration flow descriptions (e.g., BPEL),<br />
internal plans (logical representation) and internal compiled plans (physical representation).<br />
Here, the term plan is a shorthand for internal plans. In Section 2.4, we present<br />
several use cases that are used as example plans throughout the whole thesis.<br />
2.3.2 Transactional Properties<br />
There are several transactional properties <strong>of</strong> integration flows that must be guaranteed<br />
under all circumstances. In this section, we discuss different problems that can occur while<br />
executing an integration flow as well as how they are typically addressed in integration<br />
platforms and what we can imply for the cost-based optimization <strong>of</strong> integration flows. For<br />
more details, see our analysis <strong>of</strong> problem categories [BHLW08a].<br />
The most important risk <strong>of</strong> executing integration flows is the problem <strong>of</strong> message lost<br />
when using a simple send and forget execution principle.<br />
Problem 2.1 (Message Lost). Assume that the stream <strong>of</strong> incoming messages M is collected<br />
using transient (in-memory) inbound message queues. If a server breakdown <strong>of</strong> the<br />
integration platform has occurred, all messages not sent to the target systems will be lost.<br />
This is a problem because <strong>of</strong>ten the messages cannot be restored by the source systems.<br />
As shown in Section 2.1.2, this problem is typically addressed by persistently storing all<br />
incoming messages at the inbound server side <strong>of</strong> the integration platform. The resulting<br />
implication is that all integration platforms follow a store and forward principle in order to<br />
guarantee that each received message will be successfully delivered to the external systems.<br />
Thus, if a failure occurs, the stored messages are used to resume the state <strong>of</strong> execution.<br />
A failure or server breakdown can occur at an arbitrary point in time. Thus, there might<br />
be operators and interactions with external systems that have already been successfully<br />
finished, while other operators have not. When re-executing the complete integration flow,<br />
the following problem arises.<br />
Problem 2.2 (Message Double Processing). Assume that a server breakdown during execution<br />
<strong>of</strong> plan instance p 1 has occurred. Recall that typically, each interaction with an<br />
external system is an atomic transaction. Thus, there might be successfully finished operators<br />
and currently unfinished operators. Furthermore, if the external system does not<br />
support transactional behavior, there might be partially successful interactions with external<br />
systems. If we re-execute the plan instance p 1 with p ′ 1 , we might send the same message<br />
twice to an external system.<br />
In order to tackle this problem, specific recovery models for integration flows are used,<br />
where we distinguish between two types. First, there is the compensation-based transac-<br />
25