20.01.2014 Views

Master Thesis - ICS

Master Thesis - ICS

Master Thesis - ICS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Computer Science Department<br />

Antonis Misargopoulos<br />

For years now, research has been done on the design, implementation and performance of<br />

parallel distributed database systems. Teradata [94], Budda [95], HC186-16 [96],<br />

GAMMA [97], and XPRS [98] are examples of systems that actually were implemented.<br />

The performance evaluation of these systems though is mainly limited to simple queries<br />

that involve no more than one or two join operations. Recent developments in the<br />

direction of support of nonstandard applications, the use of complex data models, and the<br />

availability of high-level interfaces tend to generate complex queries that may contain<br />

larger numbers of joins between relations. Consequently, the development of execution<br />

strategies for the parallel evaluation of multi-join queries has drawn the attention of the<br />

scientific community. A high-performance scheduler in general promotes the<br />

performance of individual applications by optimizing performance measurements such as<br />

minimal execution time or speed-up. The strategy of efficient and optimized query<br />

execution that is adopted is very challenging research problem.<br />

One central component of a query optimizer is its search strategy or enumeration<br />

algorithm. The enumeration algorithm of the optimizer determines which plans to<br />

enumerate, and the classic enumeration algorithm is based on Dynamic Programming<br />

proposed in IBM’s System R project [145].<br />

Besides that, the a-priori resource allocation and management is a hard-solved and<br />

challenging problem, as well. It is extremely important for the researchers and the<br />

distributed database designers to be aware of what and how many resources of the Grid<br />

architecture are involved in the execution of a given query. Communication and<br />

computation costs are two very important factors that designate the proper computation<br />

resources for this execution.<br />

This work explores this aspect of service-based computing and resource<br />

management. We study the various data replication policies that could be followed and<br />

focus on how we can optimize queries processing technology with service-based Grids<br />

and how we can make resource allocation or co-allocation more efficient and effective.<br />

Especially, regarding the case of no data replication, we designed and implemented a<br />

high-performance dynamic application scheduler for relational linear join queries over a<br />

Grid architecture. According to the characteristics of the resources of the Grid, the<br />

scheduler takes an initial join query and designates the proper query plan execution for<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!