Cost-Based Optimization of Integration Flows - Datenbanken ...

More documents

Recommendations

Info

2 Preliminaries and Existing Techniques 2.5 Summary and Discussion To summarize, we classified existing work of specifying integration tasks, where we mainly distinguish query-based, integration-flow-based and user-interface-oriented approaches. Due to the emerging requirements of complex integration tasks that (1) stretch beyond simple read-only applications, (2) involve many types of heterogeneous systems and applications, and (3) require fairly complex procedural aspects, imperative integration flows are increasingly used. Hence, we further classified the modeling, execution and optimization of these integration flows in detail according to a generalized reference system architecture of an integration platform for integration flows. Typically, an integration flow is modeled as a hierarchy of sequences with control-flow semantics. The control-flow semantics subsumes also implicit data-flow semantics by using instance-local, materialized intermediates in the form of variables. With regard to the optimization of such integration flows, we can summarize that mainly rule-based optimization approaches (optimize-once) have been proposed so far. This optimization model has two major drawbacks. First, adaptation to changing workload characteristics is impossible because the flow is only optimized once during the initial deployment. Second, many cost-based optimization decisions cannot be made statically in a rule-based fashion. In contrast to the rule-based optimization of integration flows, there are numerous approaches of adaptive query processing in different application areas. However, these approaches are tailor-made for specific system types and their underlying assumptions of execution characteristics. For example, plan-based adaptation in DBMS is based on the assumption of long running queries over finite data sets, while continuous-query-based adaptation in DSMS relies on the assumption of continuous queries over infinite tuple streams. In contrast to these system types, integration flows exhibit the specific characteristics of being deployed once and executed many times, where many independent instances—with rather small amounts of data per instance—are executed over time. In conclusion, the major research question is if we can exploit context knowledge of integration flows in order to design a tailor-made optimization approach that takes into account these specific characteristics of integration flows. As a formal foundation, we defined the basic notation in the form of a meta model for integration flows, including a message meta model that covers all static data aspects and a flow meta model that precisely defines the plan execution characteristics as well as interaction-, control-flow-, and data-flow-oriented operators. This meta model reflects the common modeling and execution semantics of integration flows as well as their specific transactional requirements and thus, all results of this thesis can be seamlessly applied to other meta models as well. Furthermore, we specified example integration flows within the context of the two major use cases of horizontal and vertical integration. These example flows represent the main characteristics and different facets of integration flows and hence, they are used as running examples throughout the whole thesis. Putting it all together, there are existing approaches for query-based, integration-flowbased and UI-oriented integration. From the perspective of optimization, there exist tailor-made techniques for adaptive query processing. In contrast, the optimization of integration flows is mainly rule-based. Thus, the focus and novelty of this thesis is the cost-based optimization of integration flows that is strongly required in order to address the high performance demands when executing integration flows. 32
3 Fundamentals of Optimizing Integration Flows In this chapter, we introduce the fundamentals of a novel optimization framework for integration flows [BHW + 07, BHLW08f, BHLW08g, BHLW09a] in order to enable arbitrary cost-based optimization techniques. This framework is tailor-made for integration flows with control-flow execution semantics because it exploits the major integration-flowspecific characteristic of being deployed once and executed many times. Furthermore, it tackles the specific problems of missing statistics, changing workload characteristics, and imperative flow specifications, while ensuring the required transactional properties as well. The core idea of the overall cost-based optimization framework for integration flows is incremental statistics maintenance in combination with asynchronous, inter-instance plan re-optimization. In order to take into account both, data-flow- and control-flow-oriented operators, we specify the necessary dependency analysis as well as a novel hybrid cost model. Furthermore, we introduce the periodical re-optimization that includes the core transformation-based optimization algorithm as well as specific approaches for search space reduction (such as a join reordering heuristic), influencing workload adaptation sensibility, and handling of correlated data. Subsequently, we present selected concrete optimization techniques (such as the reordering/merging of switch paths, early selection application, or the rewriting of sequences and iterations to parallel flows) to illustrate the rewriting of plans. Finally, the evaluation shows that significant performance improvements are possible with fairly low optimization overhead. 3.1 Motivation and Problem Description The motivation for designing a tailor-made optimization approach for integration flows is the specific characteristic of being deployed once and executed many times that can be exploited for efficient re-optimization. Moreover, integration flows are specified with control-flow semantics (imperative) in order to enable the execution of complex procedural integration tasks. Problem 3.1 (Imperative Integration Flows). When rewriting imperative flow specifications, the data flow and the control flow (in the sense of restrictive temporal dependencies) must be taken into account in order to ensure semantic correctness. Here, semantic correctness is used in the sense of preventing the external behavior (data aspects and temporal order) from being changed. The majority of existing flow optimization approaches [LZ05, VSS + 07, BJ10, BABO + 09] apply rule-based optimizations only (optimize-once) using, for example, algebraic equivalences. There, rewriting decisions are statically made only once during the initial deployment of an integration flow. However, this is inadequate due to the following problem: 33
Page 1: Cost-Based Optimization of Integrat
Page 4 and 5: of traditional data management syst
Page 7 and 8: Contents 1 Introduction 1 2 Prelimi
Page 9: Contents 6.5 Experimental Evaluatio
Page 12 and 13: 1 Introduction for integration flow
Page 14 and 15: 1 Introduction violated. We present
Page 16 and 17: 2 Preliminaries and Existing Techni
Page 44 and 45: 3 Fundamentals of Optimizing Integr
Page 92 and 93:
3 Fundamentals of Optimizing Integr
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
4 Vectorizing Integration Flows •
Page 100 and 101:
4 Vectorizing Integration Flows exe
Page 102 and 103:
4 Vectorizing Integration Flows Ove
Page 104 and 105:
4 Vectorizing Integration Flows Alg
Page 106 and 107:
4 Vectorizing Integration Flows two
Page 108 and 109:
4 Vectorizing Integration Flows Inv
Page 110 and 111:
4 Vectorizing Integration Flows (a)
Page 112 and 113:
4 Vectorizing Integration Flows o 2
Page 114 and 115:
4 Vectorizing Integration Flows In
Page 116 and 117:
4 Vectorizing Integration Flows 2.
Page 118 and 119:
4 Vectorizing Integration Flows The
Page 120 and 121:
4 Vectorizing Integration Flows ord
Page 122 and 123:
4 Vectorizing Integration Flows 4.3
Page 124 and 125:
4 Vectorizing Integration Flows P
Page 126 and 127:
4 Vectorizing Integration Flows P
Page 128 and 129:
4 Vectorizing Integration Flows t1:
Page 130 and 131:
4 Vectorizing Integration Flows We
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
4 Vectorizing Integration Flows 4.7
Page 140 and 141:
5 Multi-Flow Optimization cannot be
Page 142 and 143:
5 Multi-Flow Optimization the query
Page 144 and 145:
5 Multi-Flow Optimization The incom
Page 146 and 147:
5 Multi-Flow Optimization example p
Page 148 and 149:
5 Multi-Flow Optimization not allow
Page 150 and 151:
5 Multi-Flow Optimization partition
Page 152 and 153:
5 Multi-Flow Optimization mention i
Page 154 and 155:
5 Multi-Flow Optimization • Case
Page 156 and 157:
5 Multi-Flow Optimization k ′ . F
Page 158 and 159:
5 Multi-Flow Optimization partition
Page 160 and 161:
5 Multi-Flow Optimization the waiti
Page 162 and 163:
5 Multi-Flow Optimization Execution
Page 164 and 165:
5 Multi-Flow Optimization Thus, for
Page 166 and 167:
5 Multi-Flow Optimization Thus, T L
Page 168 and 169:
5 Multi-Flow Optimization • P 5 :
Page 170 and 171:
5 Multi-Flow Optimization decreasin
Page 172 and 173:
5 Multi-Flow Optimization (a) Fixed
Page 174 and 175:
5 Multi-Flow Optimization reached,
Page 176 and 177:
5 Multi-Flow Optimization (2) plan
Page 178 and 179:
6 On-Demand Re-Optimization categor
Page 180 and 181:
6 On-Demand Re-Optimization present
Page 182 and 183:
6 On-Demand Re-Optimization stratum
Page 184 and 185:
6 On-Demand Re-Optimization o 3 o 4
Page 186 and 187:
6 On-Demand Re-Optimization For on-
Page 188 and 189:
6 On-Demand Re-Optimization 6.3.1 O
Page 190 and 191:
6 On-Demand Re-Optimization such th
Page 192 and 193:
6 On-Demand Re-Optimization Join En
Page 194 and 195:
6 On-Demand Re-Optimization f γ((
Page 196 and 197:
6 On-Demand Re-Optimization The res
Page 198 and 199:
6 On-Demand Re-Optimization project
Page 200 and 201:
6 On-Demand Re-Optimization (a) Sel
Page 202 and 203:
6 On-Demand Re-Optimization (a) Loa
Page 204 and 205:
6 On-Demand Re-Optimization ical re
Page 206 and 207:
6 On-Demand Re-Optimization evaluat
Page 208 and 209:
6 On-Demand Re-Optimization 6.6 Sum
Page 210 and 211:
7 Conclusions Existing approaches b
Page 212 and 213:
Bibliography [BBD05a] Shivnath Babu
Page 214 and 215:
Bibliography [BHP + 09b] [BHP + 11]
Page 216 and 217:
Bibliography [CM95] Sophie Cluet an
Page 218 and 219:
Bibliography [GZ08] [HA03] [Haa07]
Page 220 and 221:
Bibliography [IKNG09] [INSS92] [Ioa
Page 222 and 223:
Bibliography [LX09] [LZ05] Rubao Le
Page 224 and 225:
Bibliography [OMG07] OMG. XML Metad
Page 226 and 227:
Bibliography [Sto02] Michael Stoneb
Page 228 and 229:
Bibliography [ZRH04] Yali Zhu, Elke
Page 230 and 231:
List of Figures 3.27 Workload Adapt
Page 233:
List of Tables 2.1 Interaction-Orie
Page 237:
Selbstständigkeitserklärung Hierm
show all

Cost-Based Optimization of Integration Flows - Datenbanken ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?