a Grid Computing System - Utopia

a Grid Computing System 

Radu Niculita 

February-August 2002 

Computer System Laboratory, Computer Science, 

School of Computing, Automatic Control and Computers Faculty, 

National University of Singapore Politehnica University of Bucharest 

Advisor: A/P Teo Yong Meng Advisor: Prof. Dr. Ing. Nicolae Tapus

“There is nothing more difficult to take in hand, more perilous to 

conduct, or more uncertain in its success than to take the lead in the 

introduction of a new order of things...but nothing is more thrilling.” 

Niccolo Machiavellii

Abstract 

ALiCE is a middleware that supports generic grid application development and deployment. It 

is build in Java using the Sun Microsystems JavaSpaces technology and it is designed with platform 

independence, scalability, modularity, performance and programmability in mind. The system that 

this document presents is build using the concepts from the previous version of ALiCE, developed 

at the National University of Singapore. 

This documents presents the design and the features in the ALiCE system. The need to re- 

design ALiCE raised from the actual requirements of a flexible and functional GRID computing 

middleware system, with an emphasis on scalability and modularity. 

Although the templates and programing templates that the programmer will use are mainly the 

same and the basic concepts are unchanged, hence the old application port will be easy, the new 

system is entirely rewritten from scratch. The design is much more scalable and very easy to build 

on, so future developments are possible with minimum effort. Also, the new design permits the 

support for non-Java applications. 

The means of communication in the new design are generic ones, involving live object freely 

moving through the system, which leave place for any kind of communication and synchronization 

inside the system and between the new ALiCE applications.

Contents 

I. Introduction 7 

1. Introduction 8 

1.1. Motivation of Grid systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 

1.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

1.2.1. Legion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

1.2.2. GLOBE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

1.2.3. Globus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

1.2.4. Condor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

1.2.5. SETI@home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

1.2.6. Distributed.Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

1.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

II. Design, Architecture and Implementation 12 

2. ALiCE Design Overview 13 

2.1. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

2.2. The Advantages of the New Architecture . . . . . . . . . . . . . . . . . . . . . . . 15 

2.3. Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

2.3.1. Java, Jini and JavaSpaces . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

3. The Basic Building Block - ONTA (Object Network Transfer Architecture) 19 

3.1. How Does ONTA Works? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 

3.2. The Object Writer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 

3.3. The Object Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 

4

Contents 

3.4. The Remote Object Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

3.5. The Object Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

3.6. The File Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 

3.7. The Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

3.7.1. The Protocol Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

3.7.2. The Protocol Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 

4. ALiCE Architecture and Implementation 34 

4.1. An Overview of the Components of the System . . . . . . . . . . . . . . . . . . . 34 

4.1.1. Three-tier architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

4.1.2. System Components Overview . . . . . . . . . . . . . . . . . . . . . . . . 34 

4.2. The Communication Between Components of ALiCE . . . . . . . . . . . . . . . . 37 

4.2.1. Communication Through Object References . . . . . . . . . . . . . . . . . 38 

4.2.2. Communication Through Messages . . . . . . . . . . . . . . . . . . . . . 40 

4.2.3. General Communication Scheme for an ALiCE Application . . . . . . . . 41 

4.3. System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

4.3.1. The Common Components . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

4.3.2. The Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 

4.3.3. The Resource Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 

4.3.4. The Producer and the Task Producer . . . . . . . . . . . . . . . . . . . . . 58 

4.3.5. The Data Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 

III. Sample Applications and Performance Testing 66 

5. Example of ALiCE Applications 67 

5.1. Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 

5.2. Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

5.3. DES Key Cracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 

5.4. Protein Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 

6. Performance Testing 73 

6.1. The Test Bed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 

6.2. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 

6.2.1. Performance Evolution with Variance of Task Size . . . . . . . . . . . . . 74 

6.2.2. Varying the Number of Producers . . . . . . . . . . . . . . . . . . . . . . 76 

6.2.3. Overhead Variation with Task Size for Direct Result Delivery . . . . . . . 78 

5

Contents 

6.2.4. Overhead Variation with Task Size for Delivery of Results Through Re- 

source Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 

6.2.5. Performance Comparison with the Old Version of ALiCE . . . . . . . . . 84 

IV. ALiCE GRID Programming Model 86 

7. Developing ALiCE Applications 87 

7.1. The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 

7.2. Template Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 

7.2.1. The Task Generator Template . . . . . . . . . . . . . . . . . . . . . . . . 88 

7.2.2. The Task Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 

7.2.3. The Result Collector Template . . . . . . . . . . . . . . . . . . . . . . . . 90 

7.2.4. Data Files Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 

7.2.5. Inter-task Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 93 

7.3. Simple application examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 

7.3.1. Simple Example and Data File Usage . . . . . . . . . . . . . . . . . . . . 94 

7.3.2. Simple Inter-Task Communication and Spawning new Task from a Task . . 96 

8. ALiCE Programming Templates 100 

8.1. The Task Generator Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 

8.2. The Result Collector Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

8.3. The Task Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 

9. Conclusions 104 

9.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 

9.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 

6

Part I. 

Introduction 

7

1. Introduction 

1.1. Motivation of Grid systems 

Grid computing, the harnessing of the immense computational resources provided by geographi- 

cally dispersed computers connected via network as one large parallel machine (also called a grid 

or a meta computer) has intrigued the minds of researchers for many years. 

The basic idea is quite simple: a large part of the computing power available from the com- 

puters today remain unused, starting with the supercomputers that are used as low as 40% of their 

capacity and ending with the home user that owns a computer that is only used for desktop appli- 

cations and hence has a lot of wasted computational power. The number of computers connected 

to the Internet has increased more than four times between 1999 and 2002, from 43 million to 

a staggering 190 million (as of February 2002) and the trend is still increasing. These numbers 

present ample opportunities for the Internet to be used as a powerful distributed computing system 

that can compete with and best any single-machine computers that are available today. 

Projects such as SETI@home, Genome@home, The Globus Computational Grid, MIT’s Bayani- 

han, Milan, among many many others, have not only contributed ideas on how grid computing 

systems can be implemented, but they have also demonstrated the potential of grid computing 

systems. 

A grid computing system has several advantages. The first advantage is the huge amount 

of available resources, on an Intranet, on a cluster and especially on the Internet. Current and 

past projects have demonstrated that is highly efficient to harness the idle cycles of hundreds or 

thousands of machines for a distributed application. Though this is very promising, the world of 

grid computing is still at its beginning and research in this field is still in infant stage. 

ALiCE wants to offer a new approach, based on distributing java persistent objects through the 

network and thus offering a lot more expression power than other systems available today. 

With all its advantages, grid computing comes with a lot of challenges that a grid system 

developer faces. The first challenge is the adaptive, non-uniform and non-predictable nature of 

8


the network. In a grid computing system, machines can leave and join the system at any time. A 

grid computing system, therefore, must be able to adapt to the dynamic behavior of the resources 

available. 

Another challenge is imposed by the nature of applications that can be executed on a grid 

computing system. Not all applications can be parallelized to such a degree that they can be run on 

a distributed systems and there are even application that are inherently sequential. Even the best 

suited applications should have implementations especially developed to run on a grid computer 

system. In this light, a challenge rises from the need to provide a general middleware and a feasible 

grid programming model, which should make application development possible and easy. 

A third challenge that a grid computing system face is security. With the abundance of re- 

sources that the Internet provides comes also the question of how to utilize these resources in a 

secure manner, especially when running on an infrastructure as large as the Internet. Employing a 

single-machine that is not connected to the Internet to execute applications gives users the comfort 

of not having to worry about malicious users from other hosts. In an Internet-based grid comput- 

ing system, however, this is not true. Therefore, it is of crucial importance that the grid computing 

systems provide means to ensure that security precautions are taken. 

Yet another challenge is in offering the best performance possible to users. Although in a 

system running such different applications on a such heterogeneous resources, the guaranteeing of 

performance is not possible, best use of resources is, however, a must. This challenge translates 

into trying to obtain the best possible performance for the users of the grid computing system, 

despite the heterogeneous nature of available resources. This means that, even though the resources 

are abundant, a grid computing system would make the best possible use of them. Although 

some Intranet consists of relatively homogeneous resources, others and as an extreme example, 

the Internet is made up of computers with varying configurations and capabilities, connected by 

networks of varying connection latency, speed and reliability. Therefore, it is not trivial to answer 

the question of which machine the grid computing system should send a computational task to. 

1.2. Related Work 

1.2.1. Legion 

Legion [16], from the University of Virginia, is an object-based grid computing middleware that 

provides a single address space (distributed shared memory) for all nodes to use as a medium for 

exchanging objects. Legion is totally distributed. Hence, the system has no centralized node of 

any kind that function as a manager or a coordinator of the shared memory. Objects are sent from 

one physical address space to another via message passing. 

9


It is designed for wide area networks and it supports a variety of programming languages, such 

as C++, Fortran, Mentat, Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). 

Though it is claimed to be scalable it is platform dependent on the UNIX platform. 

1.2.2. GLOBE 

GLOBE [15] is a Java-based grid computing middleware developed at Vrije University, Nether- 

lands. Like Legion, it provides a single address space for nodes to use as a distributed shared 

memory. Unlike Legion, however, GLOBE nodes keep a local object that functions as a represen- 

tative of the remote object in its local physical memory. The GLOBE system does not only allow 

sharing of computational alone, but it allows the sharing of any resource. 

1.2.3. Globus 

The Globus [4, 14] project at Argonne National Lab focuses on building a toolkit, a middleware, 

for building grid computing systems. It provides several basic lower-level services that simplify 

the design of higher level services that serves as a meta computer. These services include naming 

services, security, and resource management. 

Globus is the most well-known grid middleware available today. Compared to Globus, even 

though is much more lighter, ALiCE has some definitely advantages, like platform independence, 

ease of use and administration, better control over resources. ALiCE also targets more the home 

user than the large systems that Globus targets. 

1.2.4. Condor 

Condor [17], developed at University of Wisconsin, is a grid computing system used harness the 

idle cycles of computers residing in an Intranet. Condor provides a set of libraries that a C program 

can link to. Through this library, the program have access to check pointing and remote system 

call mechanisms. Condor supports job migration and quality of service specifications by allowing 

the users to specify a list of preferences and requirements. Requirements specify the minimum 

resource needed to execute the job whereas preferences specify the ideal amount of resources that 

the job would want to run on. Despite the tremendous advantage that a Condor system can provide, 

it is limited to NT and UNIX platforms only. 

10

1.2.5. SETI@home 


SETI@home [12], is part of the SETI (Search for ExtraTerestrial Intelligence) program that tries 

to find intelligent patterns in the radio waves received from spaces. It is an application that runs a 

screen saver on the machines of anyone who is willing to offer the idle cycles of his/her computer 

to process those signals 

1.2.6. Distributed.Net 

Distributed.Net [13] is a project that tries to develop specific distributed applications for key- 

cracking that are working on the same principle as SETI@home. The project had a lot of success, 

successfully completing the crack of 56-bits DES key, which led to the conclusion that a 56-bits 

key is too little security. 

1.3. Summary 

In this section we will briefly present the content of this report. The report is organized in four 

parts. 

Part one is the introductory part and is ending with this summary. 

The second part is the main part, containing the design and the architecture details of the ALiCE 

grid computing system, as well as a general presentation of the implementation, starting with an 

overview of the architecture. Next, we present the basic building block of ALiCE, that is ONTA, 

our library through which we are transferring objects over the network. In the last chapter of the 

second part we present the details of the architecture and the implementation of each of ALiCE’s 

component. 

The third part is dedicated to presenting some sample applications and also the results of the 

tests we conducted on the system. 

The last part presents the programming model of ALiCE applications, consisting in a program- 

mer’s manual for ALiCE application developers. 

11

Part II. 

Design, Architecture and 

Implementation 

12

2. ALiCE Design Overview 

At this time, there are many experimental grid systems and a lot of research is focused in the 

direction of grid computing. Since this is a very promising field in computer science that is still at 

its beginning, having many approaches is very useful and productive. We aim to address some of 

the deficiencies of these existing implementation. 

In parallel and distributed systems, there are two main parallel paradigms used to model the 

system: 

the master-slave paradigm - The master-slave programming model consists of a master pro- 

gram that controls the overall function of the application and several independent slave sub- 

programs who’s task is to do computations for the mater program; this model is also known 

as the task farming model; 

the peer-to-peer paradigm - In this model, there is no central control or entity that has a cen- 

tralized view; the model consists of a number of totally independent tasks that are working 

together for reaching a result. 

In addition to those, there are a number of approaches that further refine the paradigms, like single 

program multiple data, data pipelining, divide and conqueror and speculative parallelism. 

Our approach is to use a hybrid paradigm. The approach we took is closer to the master- 

slave model, since we do have central program and several slave sub-programs that are doing the 

computations. Though, we adopt the good parts of the peer-to-peer model, by permitting each slave 

program to create its own sub-slave programs. This permits deploying more complicated parallel 

algorithms, including divide and conqueror class parallel algorithms. From this point of view, we 

are closer to the peer-to-peer model. However, we only have one entry point in the application, so 

the model mainly follows the master-slave paradigm. 

13

2.1. Design Goals 


The following are the main goals of the ALiCE system. 

Flexibility, Modularity, Scalability and Functionality Based on the ideas in the previous 

version of ALiCE, the new version is entirely redesigned. Although the old version comes with 

some very good ideas that are still used, the functionality, performance and scalability are greatly 

enhanced. The previous version was unable to run more than one application at a time and some 

very important functions were not implemented. Also, the design was not very modular and lacked 

the flexibility needed in a grid computing system. 

The new design focuses most on functionality and scalability. The system is now fully func- 

tional, ready to be deployed in real-life conditions. It supports multiple applications, multiple 

clients and it has a very scalable implementation. 

Platform Independence for Java Applications Some grid computing systems are restricted 

by the Operating Systems or hardware platform that it runs on; therefore, limiting the resources 

that the grid computing system can harness. We feel that such restrictions defeat the purpose of 

having a grid computing system. The Internet, as an example, consists of machines of diverse 

platforms with different operating systems running on different types of hardware configurations. 

ALiCE is platform independent and is scalable to hardness all the resources on the Internet. This 

platform Independence comes from using Java, which is a cross-platform language that is based 

on a virtual machine which has implementation for almost any computer platform existent today 

Generic Infrastructure Support Some grid computing systems are restricted to the certain 

applications that it is built for. These are systems that are specifically designed to solve certain 

problems that are computationally intractable with single-machine systems. Such grid computing 

systems, though very useful in terms of the specific problems that they address, are incapable of 

addressing problems that they are not designed to solve. 

ALiCE is a generic runtime infrastructure on which users can deploy any applications. This is 

achieved through the use of programming templates. 

Generic Communication Support Many grid computing systems, as well as the previous 

version of ALiCE, are forcing the user to use a particular protocol to transfer files and informations 

over the network. In a world in which the security is a big issue, a more flexible approach is needed, 

in order neither to expose the user data to others, nor to overload the system with unnecessary 

cryptography. For this, ALiCE offers support for user-developed communications protocols, which 

14


can be plugged in as modules, without even restarting the system. There are although a number of 

built-in protocols already developed that should suit any security needs a user could impose. 

Non-Java Support Even though using Java as the language to develop ALiCE application has 

many benefices, the most important being platform Independence, Java is not the only language 

out there today. Many programmer still develop applications in other languages, like C or C++, 

and asking them to migrate all to Java in order to use a grid computing system is unreasonable. So 

ALiCE includes support for the C language in this version, as well as provisions in the design to 

develop support for any other languages. 

Performance Execution Having an abundance of resources available would not be of great 

advantage unless we can make good use of those resources. Therefore, we want ALiCE to be able 

to provide performance for the applications that runs on it. There are also provisions for further 

enhancements, like developing new scheduling algorithms. 

Ease of Setup and Maintenance From the user’s perspective, it is inconvenient and unde- 

sirable to have a system that is difficult to setup and maintain. Considering that ALiCE is meant 

to be used by many users, possibly located in differing geographical locations, it is very important 

that ALiCE be easy to setup and maintain. 

ALiCE is developed as an application mainly, so there is no need for special privileges or for 

any insight knowledge of the machine it is deployed on. The user will just run a program. 

Anonymity and Security In an untrusted environment such as the Internet, it is of vital 

importance that we do not disclose information that may be used by malicious users. For this 

reason, ALiCE nodes do not have information about other nodes. Also, only authenticated nodes 

are allowed in the ALiCE system. 

2.2. The Advantages of the New Architecture 

Although the current version of ALiCE comes with some brilliant and innovative ideas, the design 

and the implementation of the system are to some extent faulty, as they lack functionality and 

they don’t take full advantage of the possibilities that are opened by the live-object migration 

technique. The main concept is just the same, so the system is build around a central point which 

is JavaSpaces from Sun Microsystems, but, as all the other parts in the system, changing from 

JavaSpaces to another implementation of a distributed share memory implementation can be done 

15


without affecting the rest of the system. In fact, at the later stages of the development, we switched 

to using GigaSpaces instead of JavaSpaces, without imposing any change what so ever to the rest 

of the system. 

The new system still lacks the support needed in the fields of security and scheduling, but those 

are beyond the point of this design. The security, as well as support for different schedulers, can 

be developed and added on later in the system, with little or no change at all to the actual code. 

The system supports now multiple applications at the same time and multiple clients at the same 

time. The combination of those is also supported, meaning one could have multiple applications 

submitted by the same client at the same time, or even distinct instances of the same application. 

The new architecture and implementation aim to create a system which is functional, high 

performance, flexible and open at the same time. Some major advantages of the new approach are: 

the ability to move live object through the systems very easily 

The main focus of ALiCE approach to grid computing is the possibility of migrating a living object 

from one machine in the system to another machine. This opens a whole world of possibilities. 

The architecture is designed in such a way that object transfer is done in a very general manner, 

so adding new means of communications by way of live object transfer is easy to do. This means 

that you can do anything from simple tasks like synchronizations and message communication to 

complex tasks like comportamental transfer, object request/delivery of any kind and even creating 

new tasks from inside other tasks, opening way to a direct approach in solving divide and conquer 

problems. The serialization/transfer/reload of objects is done through a general library developed 

especially for that purpose, with ease of use, portability and performance in mind. For mode 

details, see the chapter about ONTA (chapter 3). 

improved scalability 

Since we are developing a grid programming system, the scalability was a major concern in the 

design. We kept in mind that this is a totally distributed system, so the add of new components of 

any kind to the system can be done easily and with little or no overhead. The growth of the system, 

which is inherent since it is a grid system, is in this way sustained and will impose no change 

in the implementation or the design. This means you can add new resource brokers, new task 

producers and new data servers on the fly without adding stress to the system. No entity is aware 

of the presence of other entities of the same kind in the system, so the architecture is inherently 

distributed and highly scalable. 

adaptability 

16


The system design includes plug-in like support for new protocols and runtime supports. That 

means that adding new protocols for file transfer (the files are the objects in a serialized form) can 

be done on the fly, without restarting the system. The ALiCE developer will be able in this way to 

deploy new security techniques over network traffic and fix security holes without redeploying the 

system. 

performance 

The performance was a big issue in the system design. The main focus was that we are dealing with 

a very big system and that we are using a central distributed share memory, that is JavaSpace; we 

tried to keep the trace in JavaSpaces of the transfers to a minimum, so transferring a file or an object 

between part of the systems means just placing a reference to it in JavaSpaces, the reference being 

very small. Also, in order to improve performance, we are using a multi-thread model, which will 

be detailed later in this report. There are some other issues in order to improve performance, like 

having a file manager that handles all the file storing/retrieval (hopefully, it will support caching in 

the future :-)) and the possibility to deploy a multiple resource broker system. 

modularity and ease of future development 

The whole architecture and implementation are very modular and the code is well organized, so 

future development will be easy. Also, the modules are as independent as possible, with well 

defined interfaces, so changes inside a module should not affect other modules. 

2.3. Design Decisions 

The design of ALiCE involves several decisions that has been made to fulfill our design objectives. 

2.3.1. Java, Jini and JavaSpaces 

The Java language is chosen for ALiCE for various reasons. Java is a platform independent lan- 

guage and the Java Virtual Machine has been implemented on various platforms to allow different 

platforms to share executables. 

The second reason is the popularity of the Java language itself. Choosing a popular language 

for a grid computing system allows users to learn how to build applications for ALiCE quickly and 

easily. 

Thirdly, Java has provided us with various technologies that aids the development of a dis- 

tributed system such as ALiCE. These technologies include Jini and JavaSpaces. Jini is a set 

17


of Java APIs that facilitate the building and deploying of distributed systems. Jini provides the 

"plumbing" that takes care of common but difficult parts of distributed systems. 

Jini consists of a programming model and a runtime infrastructure. The programming model 

helps developers build distributed systems that are reliable, even though the underlying network 

is unreliable. The runtime infrastructure makes it easy to add, locate, access, and remove services 

from the network. 

JavaSpaces is a Jini service that provides a distributed shared memory for Jini enabled de- 

vices on the network. JavaSpaces helps simplify communication, coordination and sharing of Java 

Objects among the Jini-enabled devices. 

Figure 2.1.: JavaSpaces Technology from Sun Microsystems 

JavaSpaces provides persistent storage of Objects that are accessible by various machines con- 

nected over the network. These machines can be given access to write Objects into the JavaSpaces 

as well as read, modify, or remove these Objects from JavaSpaces. Figure 1.1 demonstrates these 

operations. 

GigaSpaces GigaSpaces Synchronization and Coordination Platform is a software infrastruc- 

ture for information collaboration platform for Enterprise Distributed Applications and Web Ser- 

vices. The platform is an implementation of Sun Microsystems’ JavaSpaces technology. 

In the current stage of development we are using GigaSpaces for ALiCE. The main advan- 

tage over Sun Microsystems’ implementation of JavaSpaces is that is much more faster and more 

reliable. The downside is the fact that it is a commercial product. 

18

3. The Basic Building Block - ONTA 

(Object Network Transfer 

Architecture) 

As mentioned before, the whole ALiCE system is revolving around moving objects and classes 

around the system. To support this, we developed a library that is designed to get a live object 

or a class, put it in an archive file and then load it back at the other end of a network connection. 

And the ONTA (Object Network Transfer Architecture) does just that, offering to the ALiCE core 

developer a general API to serialize and save objects, together with the associated classes and thus 

implement object persistence over the network. Also, ONTA is using a generic way to transport the 

serialized objects over the network, that is using a protocol model which is as general as possible. 

Actually, the protocol used can be added in the system at any time, on the fly, making the system 

very flexible and modular. A new protocol is composed of two parts: the server side and the 

client side, each being a class implementing a simple and generic interface. ONTA is handling 

protocols retrieval and adding of new protocols in the system, meaning dynamically loading them, 

transferring the client side to where it is needed and registering them. 

There are six components inside ONTA: 

the Object Writer, that handles serializing objects and creating jar archive files containing all 

that is needed to retrieve an object or class after the file was transported over the network; 

the Object Repository, that stores the jar archive files, advertise them to be downloaded 

by remote object loaders and introduces new protocols into the system; it can also send 

messages to other parts of the system; 

the Remote Object Loader, that basically just handles retrieving file references and down- 

loading the files; it is also retrieving the messages send to the machine it runs on by others; 

the Object Loader, which restores a saved object from a file; 

19

3. The Basic Building Block - ONTA (Object Network Transfer Architecture) 

the File Manager, which handles file naming and storage on the local disk; 

the protocols, which are moved around the system when needed. 

All the communication in ALiCE is done through JavaSpace. Thus, the space is also the method of 

synchronization between the components. This is, as far as we know, the first time that JavaSpace 

technology is used for grid computing. The idea is quite good, since JavaSpace tends to be exactly 

what is needed to communicate and synchronize with full efficiency in a distributed system: a dis- 

tributed share memory implemented over the network as a Jini service. Tests were also conducted 

using another implementation of JavaSpace, namely GigaSpaces from GigaSpace technologies. 

The results were very promising, both in terms of reliability and of speed. 

3.1. How Does ONTA Works? 

A diagram showing how a live object is transfered from one machine to another is presented in 

figure 3.1 and explained next. 

(1) Serializa object 

=> file 

Object 

Repository 

File 

(4) Download 

file 

(2) File Reference/ 

Message 

JavaSpace 

(3) File Reference/ 

Message 

Figure 3.1.: Object transfer through ONTA 

Remote 

Object 

Loader 

(5) Load object 

from file 

ONTA is actually an infrastructure used to freely move live objects over the network, with 

a great accent on scalability and ability to sustain high work loads in terms of number, size and 

20


diversity of objects. Basically, any object that implements the Serializable interface (directly or 

through another interface or class in extends) can be send over the network using ONTA. 

The approach that we took in addressing the problem of remote object loading was to imple- 

ment a mechanism our selfs and not to use the RMI mechanism because of two main reasons. First, 

we are now handling transporting files that contain the objects, not the objects themselves. This 

means that the objects are not required to be in memory all the time, thus permitting to deploy a 

grid computing system, which would be very limited by using RMI. In the same time, by not keep- 

ing the objects in memory and by putting references in JavaSpace instead of objects, the footprint 

in JavaSpace is very small and of constant size, thus the limit of how many references can be in 

JavaSpace at the same time is very high, above the limit that one would find in a grid computing 

system, hence the scalability of ALiCE. The second reason for not using Java RMI mechanism is 

that it lacks the security that a grid computing system needs. Java RMI uses the HTTP protocol 

to download files, with no possibility to change this. The security of HTTP is almost inexistent, 

compared to the flexible approach of plug-in user-developed protocols used in ONTA. 

There are two kinds of objects that we transfer through the space for inter-component commu- 

nication in ALiCE: 

Object References - these are actually references to files containing a serialized object, to- 

gether with all the classes needed to load the object. The reference to a file consists of an 

AliceURL object, which is actually a tuple of three string fields, one identifying the proto- 

col, one the host (its IP address) and one the file location on that machine. There are other 

fields inside the reference, such as a destination field that named the address of the host that 

this reference is intended to get to, a type field, some additional identification fields and the 

application ID field; 

Messages - these type of objects are used to exchange any kind of information that does not 

involve files or serialized objects; for implementing different kind of messages, this class is 

extended in several other classes. 

Inside both of these kind of objects there is an application ID. This means that all the objects 

transfered through space in ALiCE are keeping a tag that relates them to an application submitted 

into the system. This is very useful for addressing issues and for keeping track of things. 

Each application is uniquely identified in the system and this ID is used for all the objects 

related to the application. An application ID is actually the URL of the file created when the 

application is first submitted to th system and this ID is unique system-wide, since the file manager 

of ONTA is getting an unique file name for each object advertised by the Object Repository; this, 

together with the host’s IP is an unique identifier. 

21


Next, we will present each component of the ONTA system individually. In what follows, we 

present the implementation of ONTA in the case of Java applications. For non-Java applications, 

all that differs are the Object Writer and the Object Loader, which are build-in the language support 

for that application. The ALiCE system is designed and implemented in such a way that support 

for new languages can be added to the system without changing the already existing components. 

3.2. The Object Writer 

In a grid computing system, the network overhead is a problem. With machines that are contribut- 

ing to the grid present in a wide geographical area, some network connection could be slow, so we 

decided to transfer all the files over the network in a compressed form. That is, for each live object 

that is transfered through ONTA, the system creates a JAR archive that will store the object and all 

the classes it needs to be restored. At the site where the object is restored, first all the classes will 

be dynamically loaded and than the actual serialized object will be deserialized. 

Actually, in many ways, the saving of the classes that are needed by an object is very straight- 

forward: you just need to go through all the class references starting at the class that the object is 

an instance of and to save all these classes. 

We can distinguish two cases of object transfer in ALiCE: the transfer of live objects and the 

transfer of just the classes that are needed to instantiate an object. The later is the case when a 

new application is submitted into the system. In this case, the programmer supplies the class files 

needed for the application, but there is no instance of any of those classes that needs to be created 

at the consumer site and than transmitted to the resource broker or to any producer. Instead of 

this, the class file for the task generator, as well as all the classes that are referred from inside the 

task generator class, should be transfered to the task producer and the task generator should be 

instantiated and run there. The need to transfer live instances of classes arises more often, being 

the case of the instantiated tasks that are transfered to the producers, of the results coming back to 

the consumer or of the user objects that the tasks are communication through. 

The usage of the ObjectWriter class is quite simple: when instantiated, this class will create a 

temporary JAR archive file that will store the classes and perhaps the live object. After instantia- 

tion, the API of the class consists of these public methods: 

public void addFile(String _fileName); 

public void addClass(String _classFileName); 

public void addClass(Object _obj); 

public void addObject(Object _obj); 

public File getJar(); 

22


After all the intended files, classes and objects are added, the user can get the a File object for 

the archive by calling the getJAR() method. 

stream. 

The addFile() call is very straight-forward and it adds to the archive the named file, as a byte 

The second call adds a class to the archive, given the filename for the class file. We should 

stress the fact that the first class added in this manner to the archive will be the one instantiated 

by the ObjectLoader as the object carried in this archive (e.g. when packing an application, the 

first class added to the archive must be the TaskGenerator class). This method will also traverse 

all the references deriving from the given class file. The support for finding the references inside 

a class file is provided by the fact that all the references are saved in the class file by the java 

compiler. This support is provided by a very simple parser that gets the class references list from 

analyzing the given class file. It all meant just interpreting the content of the file by following the 

class file structure, as published by Sun. This is done through another class named ClassFile that 

just returns a linked list of files when its method GetClasses() is called after initializing an instance 

of the ClassFile class with the class file given to the object writer. 

The addObject() call just creates a new entry inside the archive and serializes the object by 

means of an ObjectOutputStream. We mention that this call also saves all the objects that are 

contained inside the one that is written with the writeObject() call. But this will only save a 

snapshot of the live object, that is of its fields, not the code. In order to be able to restore the live 

object, one should also save all the code for that object and all the objects contained in its fields, in 

the fields of those fields and so on and so forth; this is done by starting with the class the object is 

an instance of, through the call addClass(Object _obj), the third call given above. 

To see how this is done, some preliminaries first. The references in each object to other objects 

form a hierarchy of classes that is an oriented graph. We should save all the classes for each of 

those references. The issue is traversing the graph of object references in such a way to get to all 

the classes and save all of them, but in the same time not to get into cycles and not to save the same 

class twice if two instances of it are encountered. Let us consider an example to be more explicit. 

Let’s consider the that we should save an instance of the class A, that is an live object from that 

class, given the following definitions: 

class A { class B { 

B ab; C bc; 

D ad; D bd; 

.... .... 

} } 

class C { class D { 

23

} 


A ca; B db; 

D cd; .... 

.... } 

Figure 3.2.: Example of an object’s references hierarchy 

The hierarchy of references created by this object being saved to the jar archive is presented 

in figure 3.2. Starting with the object that instantiated the class A, we should traverse all the 

references and save the class B and class D. We are stressing here that the references are all resolved 

at compile time by Java, even the ones referring to an interface implementation (e.g. interf1 Obj 

= new Implementation()) and hence all the classes needed for an object can be found by starting 

at analyzing the class file for the class that that object is an implementation of. Than we should 

check all the references in those classes too. Checking the references in class B we end up saving 

the class C and then, checking the references of this class, we could end up in a loop. This should 

be avoided. At a first look, it looks like we should avoid to begin inspecting the objects, not the 

classes that have been previously traversed. That is because there could be other instances of the 

same class that could lead to different points in the hierarchy. That is not true, since different 

instances of the same class will end up leading us to the same classes, hence this is a static traverse 

of the class hierarchy graph. Even if there are multiple instances of the same class in the hierarchy, 

we should only begin to traverse the first instance encountered. 

The implementation does an iterative deep-first traverse of the graph, keeping track of nodes 

in a stack that remembers all the classes that were already saved. Actually, if we think about the 

fact that we are not going on with traversing any node that has been previously inspected, we end 

up concluding that the structure could be thought of as a tree, so we are doing a iterative depth- 

first traverse of a general tree. Since the algorithm of doing this is common, we would not get 

any further into details about the implementation. We are first saving the classes that are referred 

24


deepest in the tree, so there would hopefully not be any future delegation necessary for classes not 

yet loaded when the hierarchy of classes is restored. 

A problem not yet solved is that of circular references. Since the delegation of class name 

resolution is done by the JVM and we have no control over this, we don’t have a way to intercept 

circular references delegations and to resolve them; to be able to do this would mean to make 

changes inside the JVM. 

3.3. The Object Repository 

The object repository is actually just an interface that hides Java Space from the rest of the sys- 

tem and provides logical means of putting object references and messages into the space. The 

implementation lets the components of the system that are using the object repository create their 

own references to put in the space. This is because there is no general enough representation of a 

reference so it could be use by any language support that is and will be implemented in ALiCE. 

Together with the remote object loader, the object repository is the central point of ONTA. 

Basically, the API of the object repository consists of some calls that are advertising different 

types of references and messages: 

public AliceURL advertiseCode(ObjectReference or, String protoName) 

public AliceURL advertiseTask(ObjectReference taskToSchedule) 

public AliceURL advertiseTaskToSchedule(ObjectReference or, String protoName) 

public AliceURL advertiseUserObject(ObjectReference or, String protoName) 

public AliceURL advertiseProtocol(String name, String dir, String fileClient, String fileServer) 

public void reAdvertiseProtocol(String name, String toWho) 

public AliceURL advertiseData(String dataFileName, AliceURL appl, String protocol) 

public AliceURL advertiseResult(ObjectReference or, String protoName) 

public void sendMessage(Message msg) 

Most of this calls are doing the same thing, namely get an object reference from the caller, set 

the protocol inside it and put it in JavaSpace. The methods that are a little more complicated are 

presented in the following paragraphs. All the methods that are advertising a reference to a file are 

returning an AliceURL that is pointing to the file that has just been advertised. This is most useful 

for the advertiseCode call, which will actually return the application ID, as presented earlier. 

The method advertising a new protocol, public AliceURL advertiseProtocol(String name, String 

dir, String fileClient, String fileServer), is dynamically loading the class for the server side of the 

protocol, instantiate it and run it. The protocol is also registered with the ONTA registry. The 

25


reAdvertiseProtocol call is sending a protocol on request, if that protocol is locally known and it 

has not reached the component that tries to download something from the machine inside which it 

is called. This call is directed to a certain machine that requests the protocol. The remote object 

loader thread is handling protocol readvertisement. 

The only other method that does something extra is public AliceURL advertiseData(String 

dataFileName, AliceURL appl, String protocol) which is first locating a data server. Each time a 

data server thread starts, it advertises in JavaSpace, through a special message, that it is up and 

ready to receive data files. After a data server is located, the reference to the data file is created 

with that machine as the destination and the reference is written in JavaSpace. 

3.4. The Remote Object Loader 

Through the use of the download protocols and of the Object Loader (see section 3.5), the remote 

object loader is the one responsible with bringing the objects to the local machine over the network 

and restoring them on the local machine. The process of bringing to life an object saved on another 

machine has two parts: the download of the serialized form of the object and the dynamic loading 

of the object itself, once the serialized form is available on the local machine. First one of this 

tasks is carried out by the remote object loader, that is downloading of a serialized object over the 

network. As presented in this chapter, for each object saved, the ONTA keeps a single JAR archive 

file that contains the bytecode for the object and all the classes needed to restore the object. Since 

each serialized object is actually a file, the process of transporting it over the network translates 

in transporting a file over the network, and this is basically what the remote object loader does. 

It is designed, as the whole ONTA system, with flexibility, modularity and security in mind. The 

downloading of a file translates in two steps: the first one is retrieving an object reference from 

JavaSpace and the second one is actually getting the file. The first step is achieved by calling one 

of the next two methods: 

public ObjectReference getObjectReference (ObjectReference template) 

public ObjectReference waitObjectReference (ObjectReference template) 

For maximum flexibility the call gets as a parameter a template for an ObjectReference to take 

form the space. This is done considering the fact that, for example, different language support 

will need special informations, not general ones, inside the object reference, and hence we can not 

implement the creation of templates here; instead, we let the programmer of a runtime support use 

his own kind of references and handle the templates himself. The difference between the above 

two calls is that the first one is non-blocking, using a time-out and the second one blocks until a 

reference that match the given template is found in space. 

26


Once the reference to an object (actually to the file that contains the serialized form of the 

object) is obtained, the download of the file, using the protocol named inside the reference, can be 

easily done with the call: 

public File doDownload (ObjectReference ref) 

The remote object loader also provide the means to receive message (from the class Message 

or from a class extending it) from JavaSpace, with the methods: 

public Message takeMessage (Message template) 

public Message readMessage (Message template) 

public Message tryMessage (Message template) 

The first two calls are blocking until a message matching the template is received, the difference 

between them being that takeMessage removes the message from the space and readMessage only 

reads it and returns a copy of it, leaving the original in the space. The third method is non-blocking 

and removes a message matching the template from space, if such a message exists. 

The remote object loader has one other important function, that is to retrieve new protocols 

advertised in JavaSpace by other parts of the system. In order to achieve this functionality, a 

special thread is running and it waits for an object reference that contains a new protocol to arrive 

in the space, reads it when a new one is found, downloads the file, dynamically loads the client 

side of the protocol, instantiate it and registers it to the ONTA registry for further usage in the 

download process. 

3.5. The Object Loader 

The object loader is the component of the ONTA that does the delicate job of returning a life 

object, provided a JAR file archive that contains a serialized object and all the classes needed 

to restore the object. There are two other classes involved in the dynamic loading of classes in 

ALiCE, namely OntaClassLoader, a class loader class, and MyObjectInputStream, an extension 

of the ObjectInputStream class, necessary to intercept the class resolution requests sent by Java 

when loading an object through means of ObjectInputStream. 

The dynamic loading mechanism in Java is one of the most powerful tools available to the 

developer. To understand the mechanism, some preliminaries first. Class loaders are a powerful 

mechanism for dynamically loading software components on the Java platform. They are unusual 

in supporting all of the following features:laziness, type-safe linkage, user-defined extensibility, 

and multiple communicating name spaces. The purpose of class loaders is to support dynamic 

27


loading of software components on the Java platform. The unit of software distribution is a class. 

Classes are distributed using a machine-independent, standard, binary representation known as the 

class file format. The representation of an individual class is referred to as a class file. Class files 

are produced by Java compilers, and can be loaded into any Java virtual machine. The Java virtual 

machine uses class loaders to load class files and create class objects. Class loaders are ordinary 

objects that can be defined in Java code. They are instances of subclasses of the class ClassLoader. 

A Java application may use several different kinds of class loaders to manage various software 

components. In Java, a class is defined by two components: its name and its class loader. So, two 

classes with the same name, but loaded by different class loaders, will be different. 

The OntaClassLoader class loads classes either directly from a class file or from a JAR archive. 

The class loader is a tool used by the ObjectLoader class in the process of restoring an object. 

There are two use cases for the object loader, the first one being the one in which we are restoring 

an object that was saved from a live instance (the cases of serializing tasks at the task generator’s 

site or results at the producer’s site). The second use case is when there is no actual live object to 

transmit, so only class files are put in the archive. This is the case of sending the application from 

the consumer to the task producer’s site - the application does start only when the task generator 

starts, after sending it to the task producer. In this later case, since there is no live instance serialized 

in the file, the first class saved inside the archive will get instantiated and the result of loading the 

“object” from that archive will be this instance. Accordingly to the two use cases, the API of the 

object loader consists of two main methods, together with another call to clean up that will be 

explained a little later: 

public Object loadFromSavedClass (File f) 

public Object loadFromSavedObject (File f) 

public void cleanUp (File f) 

Internally to the class, the main method used is getClasses(File f) that retrieves all the class file 

entries from the JAR archive and dynamically loads them with a new instance of the OntaClass- 

Loader. 

The restoration of the byte code for a saved instance of an object is done through the My- 

ObjectInputStream class, which will call the ObjectInputStream methods, but will intercept name 

resolution delegation for the classes encountered in restoring the object, for they should be dele- 

gated to the corresponding class loader that loaded them, which is an instance of OntaClassLoader 

class. This poses the problem of saving the class loaders for each class that was dynamically 

loaded, as well as retrieving the right class loader and also clean them up when they are not needed 

anymore, since we are dealing heavily with mobile code and not cleaning these stored class loaders 

28


will mean that the system will grow in terms of memory size unlimited. Also, we need to retrieve 

a class loader for a specific class whenever trying to save to a new archive a class that was loaded 

by one of our class loader. For example, at the task producer, we are loading, say, a Result class, 

as it is required for the instantiation of the task generator (through a link in the tree of references 

- see ObjectWriter section 3.2). When we are trying to save the serialized task to an archive to 

be delivered to a producer, we end up needing the Result class. Since this class is loaded by an 

OntaClassLoader, a call to Class.forName to restore the class (in order to find the location of the 

class file on the disk) will also require providing the class loader that loaded the class, otherwise 

the call would be unsuccessful. To solve the problem of storing the class loaders, we use one class 

loader for each object restored. So, for example, when retrieving at a producer a task from a JAR 

archive, we will use one class loader for all the classes needed and associate this class loader with 

the instance of the restored task. When a class is needed from a previously used class loader, we 

will have the object that that class is related to (when saving new serialized object archives) or the 

class loader itself (when restoring an object - when reloading an object we would have first loaded 

all the classes inside the archive using exactly this class loader that is needed). The problem of 

retrieving class loader is thus resolved. As for cleaning up these class loaders, this is achieved for 

all the references to class loader saved in relation to a JAR archive, by calling the cleanUp (File 

f) method presented above. This should be done when the classes from that file are not longer 

necessary (e.g. after a task has returned the result or after a task generator has returned). All the 

class loaders are saved inside a hash table that is stored in the RemoteObjectLoader class. 

3.6. The File Manager 

In ALiCE, all the serialized objects are stored in a file, as well as new protocols. Since the job of 

serializing/deserializing objects and of transferring them over the network is handled by ONTA, 

one important component of ONTA is the file manager that keeps the file in an organized manner. 

Also, the file manager is designed so future development of the file storing system (e.g. imple- 

mentation of caching or other similar mechanisms) can be isolated from the other parts of the 

system. 

Since we can have multiple applications using the same filename or even different instances of 

the same application, using the same files and also taking into account that the filename for each 

archive file containing an application should be unique, the approach we took is creating a new 

filename for each file that the file manager will store. 

There are two situations in which a file is stored with the help of the file manager: when a 

new archive is created as the result of an object being serialized through the object writer and 

29


when a file is downloaded by ONTA from the network. These two situations are different in the 

sense that when storing an archive after using the object writer, a local temporary file is already 

created, at it should just be renamed to a unique file name. When downloading a file, first a unique 

filename should be obtained and than this file name should be used to store the downloaded file. 

For handling this cases the API consists of two calls: 

public String put (File f) 

public String getFileName(AliceURL appl) 

All the files are stored starting at a root directory that is selected by using a string parameter 

passed to the constructor of the file manager when the class is instantiated. All the files are stored 

starting at this directory, which is specified by the user through the use of the GUI. There is an issue 

of creating too many files inside the same directory and the ONTA file manager uses a hierarchical 

approach, so there are never more than a maximum number of files inside a directory. If that 

number is reached, a new subdirectory is created and all the other files with the same root directory 

start will be stored in that subdirectory. We do this in order to minimize the file system overhead 

induce by a larger number of files being located in the same directory, thus additional level of 

indirection being needed. 

The put (File f) method stores a local temporary file to a unique filename inside the ONTA root 

directory. This actually just gets a new file name, creates a new directory if needed (if the current 

one is full - that is the maximum number of files has been reached) and moves the file there. 

The second method, getFileName(AliceURL appl) should be used by all the client protocols 

to store a file when it is downloaded. Since all the references contain the application that they 

are related to inside the object reference taken from the space, the file manager has the ability 

to store all the files downloaded on behalf of an application in the same directory. This greatly 

improves the ability to have a good file structure organization and to be able to clean up the unused 

files much easier. So, given an application ID (in form of an AliceURL), this call will return a 

unique filename in a directory unique for that application (if this directory does not exists, it will 

be created). Storing files for different applications in different directories also resolves the issue of 

files with the same filename being downloaded in the same directory. 

To clean up files that are not used anymore, the file manager offers two calls to handle this: 

public void markNotUsed(String name) 

public void deleteUnused() 

The first method adds a file with a given name (the name returned by one of the two calls 

presented above should be used) to a list of files that are not longer needed. When deleteUnused() 

is called, all the files in the list are physically deleted from the disk. 

30


3.7. The Protocols 

One of the principal focuses of ONTA is on plug-in protocol support. This means that new pro- 

tocols can be added to the system on the fly and thus security fixes or new security levels for file 

transfer can be implemented without even restarting the system or any part of it. Developing a new 

protocol is very straight-forward as all it is needed is implementing two classes, one for the client 

side and one for the server side. Each of these classes should implement a corresponding interface. 

In the system there is only one hard-coded protocol needed to transfer new protocols (a protocol 

before-known by all parties is necessary to permit the communication at first). 

So, to add a new protocol, the programmer should implement these two classes and to advertise 

them with the advertiseProtocol method from the object repository. What needs to be implemented 

in the server and in the client side of the protocol is presented in the following subsections. 

We stress here that the protocols are identified by their name system-wide, so one should not 

deploy a protocol if another protocol with the same name exists. Since there is no central informa- 

tion system yet developed, this is a big flaw into the system, as there is no way to tell if there is 

another protocol with the same name somewhere else into the system. This could end up in a client 

side from one protocol trying to communicate with the server side of another protocol. For this 

reason, adding protocols to the system should be done for now under a centralized human control. 

3.7.1. The Protocol Server 

The server side of the protocol mainly consists of one method, called acceptConnections, which 

should be implemented in a sever manner, that is it waits for connections and when a connection 

is requested, it should serve that connection and wait for a new one. Since the connection are 

initiated by the client side of the protocols, there is no guideline in how to implement the requests 

or the communication protocol. 

The interface for the protocol server is actually an abstract class that should be extended (it 

also contains the name of the protocol, that is set by the system, that is why it is not an interface): 

public abstract class ProtocolServer { 

} 

public abstract void acceptConnections(); 

public abstract void pushFileSupport (); 

There is one other method that can be implemented, that is the pushFileSupport() method. This 

method is called whenever a protocol server is registered and started, together with the acceptCon- 

nections(). If it is desirable to implement a way so that the connections are initiated from inside, 

31


because the machine the protocol runs on is behind a firewall, this method should be implemented. 

There is a corresponding pushFileSupport() method in the client side of the protocol to support this 

approach. The idea is that some firewalls don’t allow connections initiated from outside. If this 

is the case, a server could not function properly behind that firewall without changing the rules of 

the firewall and usually this is not desired or in some cases can not be done. In this case, another 

approach should be used, supported by the push file support. In this approach, a download should 

be implemented by the following steps: 

1. When a download is required, put a special message in JavaSpace to request that file on the 

client side of the protocol, than wait for a connection; waiting for a connection is possible by 

implementing a server mechanism (inside the client) with the help of the pushFileSupport() 

method, which will be called when the client is instantiated; 

2. On the server side, wait for requests in form of messages in JavaSpace, in loop located inside 

a thread that should be started from the pushFileSupport method; important: do not make 

a forever (while (true)) loop inside the call, as this would block the system :). When such 

a request is received, open a connection to the client side of the protocol located on the 

machine that requested the download and push the file to that machine; 

3. On the client side, when a connection is initiated to the pushFileSupport server, download 

the file and store it locally, as if it was normally downloaded. 

The default protocol offers no push file support. 

3.7.2. The Protocol Client 

The client side of the protocol is responsible of downloading a file from the server side of the 

protocol when this is requested by calling the download method on an instance of the protocol 

client class. The ProtocolClient class is an interface that should be implemented by any protocol 

developed. The interface is: 

public interface ProtocolClient { 

public File download(AliceURL url, AliceURL appl) throws 

IOException,FileNotFoundException; 

} 

public void pushFileSupport(); 

The main method that should be implemented is the download() method. It receives from 

the system two parameters, first being the actual AliceURL of the file that should be downloaded 

32


(protocol name, host address and file name on that host). The other parameter is the application on 

behalf of which this file is downloaded, taken from the reference from Java Space that lead to this 

download. This is necessary in order to pass it on to the file manager (see section 3.6) to get a file 

name where to put the file downloaded. Than the call should return a File object representing the 

downloaded file. An implementation should look like: 

public File download(AliceURL url, AliceURL appl) throws 

IOException,FileNotFoundException { 

} 

String fileName = ALICE.getFileManager().getFileName(appl); 

request_file_to_server(url.getHost(), url.getFile()); 

File f = new File(fileName); 

receive_file_from_server(f); 

return f; 

The push file supported should be implemented by the directions and with the functionality 

presented in subsection 3.7.1. 

33

4. ALiCE Architecture and 

Implementation 

A distributed system, especially a grid computing system, has several components residing on vari- 

ous locations of the system, working together for a common purpose. Each of this components has 

their own unique roles and responsibilities. The components of ALiCE are dealing with providing 

a functional and performant middleware for a general grid computing environment. 

4.1. An Overview of the Components of the System 

4.1.1. Three-tier architecture 

The ALiCE system must support three basic functions: allowing users to submit the applications 

that they wish to run, allowing users to contribute the computational power of their machines to 

the ALiCE system, and lastly resource management - matching resource demand with available 

resources. 

The same approach as the previous version of ALiCE is taken, with some new elements. Ba- 

sically, we are using a three-tier architecture, consisting of three elements: the consumer - that 

submits new applications into the system -, the resource broker - that does resource management 

and scheduling with a centralized view of the system - and the producers - that are executing code 

supplied by the applications. There is an additional component used for data files, the data server. 

This can be the same machine as the resource broker. 

4.1.2. System Components Overview 

The components of the system changed a little, since the architecture and the approach changed. 

There are some new components and some old ones have different roles. All the components are 

presented in figure 4.1 and the functionality of each of them is explained in the following. 

34

4. ALiCE Architecture and Implementation 

Consumer 

Task Prducer 

(Java/C - 

Sparc Solaris) 

Producer 

Consumer/ 

Producer Consumer/ 

Producer 

Resource 

Broker 

Task Prducer 

(Java/C - 

Intel Solaris) 

Internet / LAN 

Resource 

Broker 

Task Prducer 

(Java/C - 

Intel Linux) 

Consumer 

Data 

Server 

Task Prducer 

(Java/C - 

Intel Windows) 

Figure 4.1.: ALiCE components 

Data 

Server 

Producer 

ALiCE 

The connection between the components is done through the network, either a Local Area Net- 

work if the work environment is a cluster, or the Internet if the system is deployed over wide area 

connections. All the communication is done through JavaSpace and the means of communication 

will be further detailed later on. 

The components of the ALiCE GRID middleware are: 

The Consumer 

The consumer is the one submitting the applications to the system. It can be any machine that is 

connected to the ALiCE system through a LAN or through the Internet and that runs the ALiCE 

consumer/producer components and GUI (in the end, any machine connected to the Internet can 

use the ALiCE GRID system). This means that the user will use a GUI to submit a file containing 

the ALiCE application in a specific form the language that is written in. For Java, this file is a JAR 

archive which should contain at least one task generator class and one result collector class (see 

chapter 3 for more details about the programming model). The task generator is transported inside 

the ALiCE system in order for tasks to be generated, initialized and sent to be producer. The result 

collector is executed at the consumer at it receives the results generated by the tasks created by the 

task generator. 

35


The consumer is also the point from which new protocols and new runtime supports can be 

added to the whole system. Although the new protocols plug-in support was tested and it works, 

some tuning is steel badly needed, as well as an unified support for using the new protocols and 

selecting from them for security purposes. The plug-in support for new languages support is not 

fully deployed as yet. 

The Producer 

The producer is a machine that has volunteered its cycles to run ALiCE applications. The producer 

will receive tasks from the ALiCE system in the form of serialized live objects, will dynamically 

load them and execute the. The results obtained from each task will be sent so they can be received 

by the consumer that has originally submitted the application. 

The producer and the consumer can actually be the same machine, this being the most usual 

case, when someone who volunteers to run application from others also wants to run his/hers own 

applications in ALiCE. In order to support this, the GUI for the system is unified for the producer 

and for the consumer. 

The Resource Broker 

The resource broker is the central point of the system. Basically, the only thing it does is schedul- 

ing. The scheduling is needed for many reasons. The first is to have a control over resource 

allocation and usage, since we are dealing with multiple concurrent applications at the same time 

in the system. 

Then there are the objective needs imposed by supporting other languages than Java. In con- 

trast with the portability and platform-Independence of the Java programming language, other 

languages are platform-dependent and even library-depended. The scheduler should than choose 

an appropriate platform for the producer that should run that application. There are two types of 

scheduling done at the resource broker’s site: application scheduling and task scheduling. 

Even though there are many approaches to scheduling, we choose to have a centralized sched- 

uler, with options to do part of the scheduling distributed by means of pattern-matching when 

retrieving objects from JavaSpace. 

For more information about the resource broker and the scheduling, see section 3.3.3. 

The Task Producer 

The task producer is a machine that is part of the ALiCE core (but needs not to be so - it can be 

outside, see the detailed chapter about system components) and it is meant to run the task generator 

classes of the applications. This will generate tasks, which will be scheduled by the resource 

36


broker and than downloaded by the producers directly from the task producer. The separation of 

this machine from the resource broker (in the previous version, they were running on the same 

machine) was done for two principal reasons: 

- since we are supporting non-Java applications, those applications are platform-dependent and 

not all of them can be run at the resource broker; 

- in order to separate and isolate the central point of ALiCE, the resource broker, from any 

alien code. Since the task producer runs code submitted by consumers, we don’t have total control 

over what that code does. Even with strongly enforced security and code safety measures, we can’t 

guarantee total security. So the decision was made to run the code on another machine and in this 

way to achieve total safety of the resource broker. 

Each task producer is running either Java code (which can be run on any platform), either code 

compiled for the platform that the task producer offers. 

The Data Server 

The data server is a machine dedicated for data file storage. Any data file used by an application 

can be submitted to the data server. From inside any task, the programmer can obtain access to a 

data file submitted for the application that has generated the task. Through the reference obtained, 

the task can read or write chunks from that file of any size, from 1 to the size of the whole file. For 

more details, see the section about the data server and data files in section 3.3.5. 

4.2. The Communication Between Components of 

ALiCE 

In ALiCE, all communication initiated and supported through JavaSpace. This means that there 

will be no communication between parts of the system that do not leave a trace in the space. In 

future development, this will help with implementing a central accounting and monitoring scheme 

that will be able to register all the communications that take place in the system. 

JavaSpace is also used as the mean to synchronize between the components of the system. 

Since the API of JavaSpace place at our disposal a set of blocking calls, we can use those calls to 

synchronize different threads running on different machines in the ALiCE grid computing system. 

All the objects transfered through JavaSpace in ALiCE fall in one of two categories: file ref- 

erences or messages. These two categories are represented by the classes ObjectReference and 

Message that are the core element of data transfer in ONTA. Corresponding to the two kinds of ob- 

jects, we have two types of communication: one implies the transfer of serialized objects through 

the network, implementing a mean to have mobile code in our grid system. The other one is just 

37


for transferring information between components of the system, to advertise capabilities and re- 

sources, to synchronize parts of ALiCE and for any other purpose that has as the final purpose the 

transfer of a punctual information from one machine in ALiCE to another one. In the following, 

we will present in detail each of this two kinds of communication. 

4.2.1. Communication Through Object References 

Communicating between components of the system through usage of Object References is im- 

plemented by ONTA. The steps involved in transferring an object from one part of the system to 

another are presented in figure 3.1. For example, let’s say that the component A has an object Obj 

that it wants to send to B. another component of the system. The stages in this operation are: 

1. A serializes Obj and obtains a file containing the byte code for the live object, together with 

all the classes that Obj needs, packed inside a JAR archive; 

2. A creates an ObjectReference object corresponding to the file containing the serialized Obj 

and writes it in JavaSpace; 

3. A specialized thread in B that was waiting for an Object Reference of the type sent by A and 

will take the reference from space; 

4. B downloads the actual file containing the serialized object directly from A using the proto- 

col indicated in the reference it took from space; 

5. B deserializes Obj from the downloaded file and gets the live instance for future use. 

Basically, an ObjectReference is a pointer to a file containing a serialized object. It also contains 

some information needed for differentiate matching of different reference types and some other 

fields needed by ALiCE to handle the objects. The fields in an ObjectReference instance are: 

URL - this field points to the location of the file containing the serialized object. It is not an 

actual java.net.URL instance (this would pose too much overhead with not needed informa- 

tions), but an instance of the class AliceURL, that contains just three string fields: 

– host - the IP address (or the host name) of the host holding the file; 

– protocol - the string ID identifying the protocol that should be used to download this 

reference (for more on protocols, see section 2.7); 

– file - the full path file name location of the file containing the serialized object on the 

machine that advertised it. 

38


Type - this field helps differentiate between different kind of objects referenced by Objec- 

tReference instances in JavaSpace. The types currently supported by ALiCE are: 

– application - for the transfer of a new application reference between a consumer and a 

resource broker for scheduling; 

– code - for the actual file transfer of a new application from the consumer to the task 

producer, after the scheduler choose a task producer to run the task generator of that 

application; 

– data - for the transfer of data files from the consumer to the data server; 

– protocol - for transferring new protocols from any component of the system to any 

other component of the system; 

– taskToSchedulle - for sending a reference to a task that is ready to be scheduled from 

the task producer to the resource broker; 

– task - for the file transfer of a serialized task from the task producer to a producer after 

scheduling by the resource broker; 

– result - for transferring a serialized result from the producer that produced the result 

back to the result collector running at the consumer that started the application, either 

direct, or through a result manager running on a resource broker’s machine; 

– userObject - this type of reference is used to point to files containing serialized objects 

used in the communication between the components of an ALiCE application; these 

objects are sent and request by the application, from within the user-code. 

Application - usually, every reference to an object is related to an application, being a result 

of the application, a task of the application, the application itself or an user object. Keeping 

track of which reference is part of which application is very important to the system, from 

multiple points of view, like identifying which results will be delivered to a result collector, 

which user objects are related to an application or for purposes like security and accounting. 

Also, this identification is important in order to be able to clean the space of all the references 

to an application if this is needed or requested. The ID is also an URL, that is the URL that 

the application was advertised with by the consumer that sent it into the system. This ID is 

unique system-wide, given the fact that is comprised of the IP address of the machine (that 

uniquely identifies that machine from any other machine) and from the file name that the 

application was contained in, which is also unique (see section 2.6). 

Destination - most of the references are intended to be sent to one specific machine that 

runs ALiCE, with some exceptions (like when an application is first submitted, any resource 

39


broker could take it for scheduling). In order to be able to do this, an addressing scheme 

should be in place. Since one machine can only run one instance of ALiCE at a given time, 

that means that we can safely use the IP addresses (which are guaranteed to be unique for 

each computer). The destination field in an object reference will that contain the IP address 

(or the DNS name) of the machine that this reference is intended for. No other instance 

of ALiCE will take the reference than the one running on the computer with the given IP 

address. 

Identification - this field is a string identifier that severs different purposes, depending on 

what kind of object reference it is. The most common use for this field is to name the 

language that the code this reference to is, in case of all binary code transported in ALiCE 

(for now, this is limited to Java and C), this being the case of all the reference types except for 

the protocol type. In this later case, the identification field contains the string that identifies 

the protocol name. 

Platform informations - there are some additional fields in an object reference that are used 

in the case of references pointing to code objects (applications, task, results a.s.o.); these 

serve the purpose of defining the platform that this code can run on. There are two string 

fields in this category, one defining the processor and one the OS that the pointed binary code 

can run on. Depending on this fields and the language of the code, a producer can decide if 

it can or can not run the code contained in the object pointed by the reference. 

4.2.2. Communication Through Messages 

Messages are used to communicate small amount of information from one component of the sys- 

tem to another. The basic class that contain a message is defined in Message.java and contains just 

four fields: 

source - the IP address or the host name of the machine that sent this message; 

destination - the IP address or the host name of the machine that this message is addressed 

to; 

application - the application ID of the application that this message is related to; there are 

cases when a message has no connection to an application (e.g. message advertising the 

presence of a data server), in this case the application field is null; 

type - an integer defining the type of this message; this is the field that is indispensable for 

the matching to a template when taking messages from JavaSpace. 

40


The message class is just a frame to build on, so for implementing specific inter-component com- 

munication in ALiCE, the Message class is extended to contain specific fields for that message 

type. When creating a new message extension class, the programmer should also define a unique 

identifier as a constant in the Message class itself. 

The use of communication through messages in ALiCE has many examples. It is used for 

advertising for advertising some information (the presence of a data server, the handler of a data 

file, the scheduler of an application etc.), for request-reply purposes (asking number of results 

ready when results are stored at a resource broker, getting templates to run with for a producer 

etc.) or even for sending simple messages on behalf of an application (the sendStringMessage - 

getStringMessage mechanism between a result collector and a task generator). 

4.2.3. General Communication Scheme for an ALiCE Application 

Consumer 

(1)application 

(A)data 

Resource 

Broker 

Data 

Server 

(2)code 

(3)taskToSchedule 

ALiCE 

Task 

Producer 

(5)result - option 1 

(5)result - option2 

(4)task 

Figure 4.2.: ALiCE ObjectReference transfers 

(B)data 

chunks 

Producer 

In this section we present the algorithm that is followed for running an application in the ALiCE 

system. All the steps in running an application are illustrated in the figure 4.2 by following all the 

41


object references that are put into JavaSpace. There are a lot of additional messages that are 

transfered between the components of the system. We will look in detail at each component of the 

system in section 3.3, System Components. 

The general algorithm of running an ALiCE application is presented next: 

1. A user submits a new application into the system from a consumer machine, in the form of 

a JAR archive containing all the classes needed, including at least a task generator class, a 

result collector class; for functionality there should also be included at least one task class 

and one result class. 

2. The result collector class is dynamically loaded, instantiated and started at the consumer. 

3. An object reference with the type TYPE_APPLICATION and with no specified destination 

is placed in the space. 

4. The reference is taken by an resource broker that calls a scheduling function that will choose 

a task producer to run the task generator of the application. A new object reference with the 

type TYPE_CODE and the destination the chosen task producer is created, pointing to the 

original application archive at the site of the consumer that submitted the application. 

5. The task producer that was chosen to execute the task generator of this application takes 

the reference that was destined to it from JavaSpace and downloads the file containing the 

application from the consumer. 

6. The task producer dynamically loads the task generator class and all the classes needed by 

the task generator (through references, these classes also include the tasks classes and the 

results classes) and instantiate it, than starts it. This will initiate the process of creating tasks. 

7. Each time a new task is created and submitted into the system for producing by the appli- 

cation (through the call process() from the task generator), the task is first serialized and 

added to a JAR archive file that will contain the serialized form of the object, together with 

all the classes used and referred by this object (including the result class). Than a new refer- 

ence with the type TASK_TO_SCHEDULE is instantiated, pointing to the JAR archive just 

created, with no destination and the reference is placed in JavaSpace. 

8. A resource broker takes the reference and calls the scheduler to choose a producer that should 

run the task that is referred by the reference; this producer can be changed later without 

affecting the reference (see the subsection about the resource broker in System Components 

section). A new object reference with the type TYPE_TASK, pointing to the file containing 

42


the serialized task at the task producer’s machine, destined for the chosen producer, is placed 

in JavaSpace. 

9. The producer that was chosen at step 8 takes the reference from the space and downloads 

the file from the task producer. By using the Object Loader (see section 2.5), the object is 

dynamically loaded, together with all the classes needed by the object. The newly obtained 

task instance is started by calling the execute() method on it. 

10. The execution of the task will return an Object which is the result of this task. The re- 

sult is serialized and placed in a JAR archive and a new object reference with the type 

TYPE_RESULT is created, pointing to that file and having the destination set according 

to the result delivery mode chosen when the application was first submitted. The result will 

be sent to the resource broker that scheduled the application if the results should be handled 

through the resource broker, or directly to the result collector of the application, running at 

the consumer, if the result delivery mode is direct delivery. 

11. If the result delivery mode is direct delivery, the result manager running at the consumer that 

submitted the application takes the reference to the result from JavaSpace, downloads the file 

containing the result and dynamically loads the result object, storing it locally in memory. 

When the result collector requests a new result, this object is returned to it. 

12. If the results are sent back through the result broker 

a) The resource broker takes the reference from JavaSpace and downloads the file con- 

taining the serialized result pointed by the reference and stores it on the secondary 

storage. 

b) The result collector running at the consumer decides to check for results and sends a 

message to the resource broker asking for the number of results that are stored there 

for the application that this result collector runs for. If there are some results ready, the 

result collector can get them one by one (in order not to fill the JavaSpace with result 

references). In order to get a new result, the result collector puts a request for it in 

JavaSpace. 

c) The result manager running on the resource broker’s machine takes the request (in form 

of a message) from JavaSpace and finds a file containing a serialized result that belongs 

to the respective application. A new object reference with the type TYPE_RESULT is 

created pointing to that file located on the resource broker’s machine, with the destina- 

tion the consumer machine that runs the result collector and is written in JavaSpace. 

43


d) The result manager running at the consumer that submitted the application takes the 

reference to the result from JavaSpace, downloads the file containing the result and 

dynamically loads the result object, storing it locally in memory. When the result 

collector requests a new result, this object is returned to it. 

13. When the application’s result collector returns, all the references that are related to the ap- 

plication that has just terminated are removed from JavaSpace. 

4.3. System Components 

This section will explain the functionality and the implementation of each component in the ALiCE 

grid computing system. Every component consists mainly in some threads that are doing various 

functions, some of them common for all the components, some of them specific to either the 

consumer, the resource broker, the producer/task producer or to the data server. 

The implementation of ALiCE consists in a multi-threaded application. The system is designed 

in such a way that all the components are separated only by threads, so on the same Java Virtual 

Machine we can run even all the components at the same time. This means that there is possible 

to run a consumer and a producer on the same machine at the same time, or (more useful) to run a 

resource broker and a data server on the same machine. All this is possible as long as the fact that 

they are running in the same JVM does not pose a problem to the functionality of the system. 

4.3.1. The Common Components 

There are some parts of the system that are common to all the components. The main common 

subsystem is ONTA, which is used for object transfer by the consumers, the resource brokers, as 

well as the producers, task producers an data servers. ONTA is composed of some objects, some 

servers and a thread to collect new protocols. All the objects that are instantiated and used in all 

the components of ALiCE, as well as the common threads are presented next. 

Every time an ALiCE instance is started, there is an initialization phase that is done first, by 

calling a static init method on the ALICE class. This does some initialization of objects that are 

used by any instance of ALiCE, and starts some threads. 

The Java Space 

In ALiCE, we are using one common and unique space to transfer object references and mes- 

sages. We tried the system with both JavaSpaces and GigaSpaces and, although it is a commercial 

implementation, GigaSpace is much more faster and more stable. 

44


Any instance of ALiCE will first obtain a reference to the space by doing a static lookup. We 

are not using the discovery protocol since we want to have control over the use of the space. So the 

user of ALiCE should provide the location of the space when the system starts up. Any operation 

of writing, taking or reading object to/from the space will be done using the reference obtained 

during the lookup done at the system initialization time; this reference can be obtained with a 

static call on the ALICE class. 

We are not using a JavaSpace class, but a wrapper for the calls to the space, meaning that 

any call to take, write, read etc. objects from/to JavaSpace is actually a call to a method in the 

FastJavaSpace implemented in ALiCE. The purpose of wrapping this calls is to be able to pro- 

vide control for accessing the space. This is required for implementing security over who makes 

operation on the space, as well as being able to implement a centralized monitoring/accounting 

manager. 

The ONTA System 

Common to any instance of ALiCE, being a consumer, a resource broker, a producer, a task pro- 

ducer, a data server or a combination of these, the ONTA system has some components that are 

initialized and started by the initialization procedure of ALiCE. 

The File Manager - this controls the file naming and storage in ALiCE; it is instantiated 

providing a root directory at the system startup. For more details, see section 2.6; 

The Object Repository - through calls of methods from this class, it is possible to place 

object references or messages in the space, using a nominated protocol; 

The Remote Object Loader - this class will provide the methods of taking object references 

and messages from the space, as well as the ones to download the files referred by object 

references. Once instantiated, this class will also start the new protocol download thread. 

This thread will wait for an object reference pointing to a new protocol to appear in JavaS- 

pace. When a new such reference arrives, the thread will download the file containing the 

code for the protocol and will dynamically load and instantiate it. This will provide server 

support by starting the thread for the new protocol server, as well as an instance of the pro- 

tocol client to have the mean to download files using the new protocol. After starting the 

protocol, it will be registered with the ONTA registry; 

The ONTA Registry - there is an instance of the ONTA registry on each machine running 

ALiCE. It basically stores all the known protocols and they can be retrieved from it when 

needed, both the instance of the client side and the instance for the server side. The registry 

45


also handles the unique protocol naming by not allowing two protocols with the same name 

to register. 

The Server Threads 

At the instantiation of the remote object loader, all the protocols that are built-in the system are 

started and registered to the ONTA registry. There will be a thread running for each protocol that 

the system knows. The server side of all the protocols that are built-in ONTA consist of a thread 

that is started during the system initialization phase. In addition to those, for each new protocol 

that is received from another ALiCE instance during normal operation of the system, there will be 

a thread started on any ALiCE instance. 

Each protocol thread will serve the requests made by the client side of the same protocol run- 

ning on another machine that will use the respective protocol for downloading a file containing a 

serialized object or maybe a data file. 

The Shutdown Thread 

Common to all ALiCE instances is a thread that is used for graceful system shutdown. The prob- 

lem is that if any instance of ALiCE is stopped by killing the process it was running in, there will 

be leaks in the system, like references and messages in the space belonging to a dead application, 

as well as unused files that were not deleted. In order to provide a clean shutdown for ALiCE, a 

special thread is running that listens on a predefined TCP/IP port for a connection. If such a con- 

nection is initiated and the shutdown is confirmed by receiving a certain string, all the references 

that are related to the host that is running ALiCE that received the shutdown request are removed 

and all the local storage files used are also deleted. 

This also provides the means to do a remote-shutdown of an ALiCE instance, if this is desired, 

because the communication is implemented using TCP/IP sockets. 

4.3.2. The Consumer 

The consumer is the component of ALiCE that has two main purposes: submits new applications 

into the system and starts the result collector, also providing the means to get new results from the 

system as they are produced by the producers. 

Each language supported in ALiCE for applications (Java and C support are implemented at the 

time this report is written) implements its own consumer class, by extending the alice.consumer.Consumer 

class. To make things as easy as possible for other language support development, all that a con- 

sumer needs to implement by extending this class is the method: 

46

Result 

Retrieval 

Thread 1 

Result 

Collector 

for 

application 1 

Result 

Retrieval 

Thread 2 


Result 

Retrieval 

Thread 3 

JavaConsumer 

Result 

Collector 

for 

application 2 

Results at the 

Resource Broker 

Consumer 

Result 

Collector 

for 

application n 

Result 

Retrieval 

Thread 

Figure 4.3.: An example of an ALiCE consumer instance 

CConsumer 

Native C 

Result 

Collector 

public void start(File pack, String taskGenerator, String resultCollector, LinkedList dataFiles, 

Boolean resByRB, int threads, String proto); 

This method will be called by the GUI each time a new application is submitted into the sys- 

tem, on the consumer implementation instance for the language the application is written in. What 

this method does is start the result collector of the application by dynamically loading and in- 

stantiating the class provided by the user in the archive. That class should extend the class al- 

ice.result.ResultCollector. By extending this class, the result collector implementation gets access 

to the calls it needs to collect the actual results (inherited from the ResultCollector class), like 

getNewResult() or getResultsNoReady(). 

The implementation of the Consumer class for any language should return from this call by 

creating a new thread to run the result collector, initializing it with the data it needs, starting it and 

than it should return. The ResultCollector class is therefor extending the java.lang.Thread class. 

47


This thread model is imposed so that multiple result collectors for multiple applications could be 

run inside the same ALiCE instance by calling repeatedly the start() method on new instance of 

consumers. 

For Java language, the class extending the Consumer class is named JavaConsumer. The GUI 

instantiate the class for each application that is submitted into the system and calls the start() 

method on each new instance, providing the specific parameters needed for each application. Mul- 

tiple application can run at the same time (this translates in many result collectors for different 

applications or even different instances of the same application) on the same ALiCE consumer in 

one Java Virtual Machine, at the same time. 

A structure of an ALiCE consumer instance is presented in the figure 4.3. 

Starting a New Application 

For each new Java application, a new instance of the class JavaConsumer is created and than the 

start() method is called on that instance, providing the application’s specific parameters. This will 

create a new thread that is implemented also in the JavaConsumer class. 

The thread will first unpack the archive file containing the application; any new Java application 

should be submitted into the system as a JAR file archive containing all the classes needed and used 

by the application. The user should also provide the class names for the task generator and for the 

result collector. If there are any data files used by the application, they should be advertised by 

specifying the file names in a linked list provided as a parameter to the start() method. There are 

two other parameters sent to the start() method, both regarding the result collection. The first of 

them indicates the number of result collecting threads that should be started. The last parameter 

indicates the result delivery mode, which is chosen by the user that submits the application. For 

more details see Result Collection paragraph below. 

The consumer will first choose a running directory. The running directory are predefined loca- 

tions that the classes for an application are unpacked and run from. These directories should also 

be in the CLASSPATH environment variable for the result collector to work. The evidence of used 

running directories is kept statically by the superclass of all consumer classes, the Consumer class. 

After choosing a running directory, the java consumer will actually copy the entries from the jar 

archive to that directory, one by one; also, if there are any directory entries in the archive, these 

subdirectories will be also created in the chosen running directory and the entries in them will be 

copied to the right location. After doing this, a new ALiCE JAR archive is created by the Object 

Writer, containing the task generator and all the classes it needs and refers. Than a new object ref- 

erence with the type TYPE_APPLICATION, not destined to anyone, pointing to the new created 

archive, will be written in JavaSpace, where it will be taken by a resource broker that will schedule 

48


the application. If the linked list of data files contains entries, each entry will be advertised to a 

data server; the location of the data server is first found out by finding a message advertising a 

data server in the space. The data server will download all advertised data files and will have them 

available for future use by the tasks of the application. 

After this, the result collector class from the archive is dynamically loaded, instantiated and 

initialized. After this, the collect() method is called on the new instance. This method is the entry 

point to the result collector of the application. 

The Result Collector 

The result collector is one of the main components of an application, the one that does the re- 

sult collection and visualization. Any result collector for an application should extend the class 

alice.result.ResultCollector. As mentioned before, for each application, this class is dynamically 

loaded, initialized and run by ALiCE. Starting the result collector means simply calling the col- 

lect() method, the entry point in it. 

Initializing the result collector implies calling the init() method from the ResultCollector class, 

which is inherited by any result collector of an application by extending this class. The initial- 

ization will set the right values for some fields in the result collector subclass instance, like the 

application ID of the application this result collector belongs to (so it knows which results to take 

from the space or to request to the resource broker that stores them), the result delivery mode and 

the running directory that the result collector runs in. Also, at initialization of the result collec- 

tor, the user will specify (in the parameters of the Consumer.start() method) the number of result 

retrieval threads to start. This threads are intended for enhancing the performance of the result col- 

lector. For this, a number of threads will listen for new results in JavaSpace and, as soon a result is 

ready and an object reference to it is found in space, the result will be downloaded by one of this 

threads and will be stored locally on the machine that runs the result collector. If the application 

starts a very large number of tasks and hence a large number of results is expected to come on a 

high rate, having more threads doing the result retrieval work is useful, since the bottleneck will be 

that while one result is downloaded, others could be ready and nobody would download them. In 

this way, many results can be downloaded at the same time and they will be available to the result 

collector at a faster rate. 

So, how do the methods returning new results or the number of new results, from the Result- 

Collector class, work? The functionality is quite different depending on the result delivery mode. 

If the results are deliverer directly to the consumer, the results will be available locally as soon as 

they are downloaded by the result retrieval threads from the producers. The number of the results 

equals the number of elements in the local vector storing the result objects and getting a new result 

49


means just returning (and removing) the first element in this vector. This result delivery mode is 

recommended, as it is much faster and it imposed much less overhead on the system. 

If the results are delivered through the resource broker, the files containing them will be stored 

there. Finding out the number of results that are ready means sending a message to the resource 

broker that stored the results for this application and waiting for a reply containing the number 

of results ready. Also, getting a new result translates in sending a special request message and 

than waiting for a corresponding object reference to be placed by the resource broker in the space. 

We are only putting one result object reference at a time in the space in order to contain the space 

occupied in JavaSpace by these references. If an application would have let’s say a number of 1000 

tasks that returned results, it would be unacceptable to keep all the references to them in space 

until the application gets them. This result delivery mode should be chosen by the applications 

that will take a long time to have all the results ready and the consumer does not have a permanent 

connection to the Internet, so it will come back on line later to get the results. 

The Consumer Threads 

To sum up all the information in this section, the threads that are running on any ALiCE consumer 

instance are, beside the ones that are common to all ALiCE components: one thread that runs 

the result collector for each application, each of these thread will start a variable number of result 

retrieval threads (user specified) at the time of the initialization of the result collector. Each of this 

later threads is getting new results for the application and stores it locally. 

The Consumer GUI 

The Consumer GUI, presented in figure 4.4, provides the user with an easy to use acces to new 

application submition and for observing system output as well as each application’s output. For 

each new application submite, a new tab will be added to the consumer GUI pannel. 

4.3.3. The Resource Broker 

The resource broker is the central component of ALiCE, the one that runs the scheduler. This is 

the point where accounting and monitoring will be implemented, as all tasks and applications go 

through the scheduler. 

The approach taken in implementing the resource broker is very scalable, as there can be many 

resource brokers into the system, without one being aware of the presence of any others. When 

some task or application needs scheduling, one of the resource brokers running will take the ref- 

erence from the space and will handle it. This will mean that the references are scheduled much 

50

faster than when using just one resource broker. 


Figure 4.4.: The Consumer GUI 

An important feature of the design of the resource broker is that it is totally isolated from any 

alien code. Even though ALiCE is a grid middleware and it deals heavily with mobile code from 

outside sources, the model designed keeps all the code that runs at the resource broker’s site inside 

ALiCE classes. This means that no outside code is run on the resource broker. The previous 

version of ALiCE was running the task generators on the resource broker and this posed a very 

high security risk which was completely eliminated by moving the task generators execution on 

the task producers. In this way, the resource broker, which is a centralized point, hence a single 

point of failure in the system, is much more secure. 

The resource broker is also handling the results for application that have chosen the resource- 

broker-stored result delivery mode. 

The Scheduler 

As mentioned before, the main purpose of the resource broker is running the scheduler. The 

scheduling process has two components: scheduling new application to task producers and schedul- 

ing new tasks to producers. This translates in two types of object references that should be taken 

51


from JavaSpace (TYPE_APPLICATION for new applications and TYPE_TASK_TO_SCHEDULE 

for new tasks), than the appropriate machine should be chosen and new references (with the types 

TYPE_CODE and respectively TYPE_TASK) should be created, with the destination the chosen 

machines, and written in the space. 

Event 

Producer Create 

comes on-linecapabilities 

list 

Producer 

starts 

executing 

user code 

Producer 

needs to be 

re-tagged 

Producer 

runs with the 

new templates 

TIME 

Start 

task 

execution 

thread(s) 

Producer/ 

Task Producer 

Update 

templates 

used to take 

references 

from Space 

capabilities 

list 

templates 

list 

templates 

list 

Figure 4.5.: Tagging Producers 

J 

A 

V 

A 

S 

P 

A 

C 

E 

capabilities 

list 

Resource 

Broker 

templates 

list 

templates 

list 

Create 

templates 

list 

Create NEW 

templates 

list 

The scheduling process looks simple, but it is not quite so. We should keep in mind that we are 

dealing with a highly dynamic system, namely a grid computing system, which has a property not 

common to any other computing systems: it has at his disposal a large number of resources that are 

changing rapidly. In this environment, the classical approach to scheduling, that is to take a new 

task and tag it to be run on some work engine, is not feasible any more. This would mean that if we 

are delegating a task to be run by some producer, tag it to reflect this and write it in JavaSpace, and 

if later that producers goes down (and this happens very often in a grid system), the resource broker 

would have to take the reference back and re-tag it. This would mean a tremendous overhead and 

the system would have an immense overhead caused by this. Another approach would be to group 

the producers in previously established groups and tagging a task to such a group of producers. 

But this is not a good approach either, as it means developing a complex algorithm for producer 

52


grouping and we could end up with having no producers to run a task, even though there are 

available producers in the system. 

Our approach that we implemented is not to tag the task references that written in the space 

to a producer, bu rather tag a producer to take some particular references from the space. In this 

approach we can change the destination of a task just by delegating another producer to run it, 

without modifying the reference we put in JavaSpace. This approach will also permit the grouping 

of producers, but this time dynamically, and will mean that we can change what tasks a producer 

will run without restarting the producer and without any influence to the already scheduled tasks 

from JavaSpace. So if, let’s say, we delegate one producer to run a certain task and than the 

producer goes down, we can simply tag another producer to run that task. 

The implementation of this approach relies on the usage of JavaSpace for object references 

transfer. A producer will take from the space only references that match a certain template. Beside 

the application it belongs to, the object reference contains a series of fields that could differentiate 

between references linked to the same language and the same platform. And if a new scheduler is 

developed that needs more information inside an object reference, the ObjectReference class can 

any time be extended and new informations added in, informations that could only be used or only 

be known to exists to the scheduler. 

So, to tag a producer to run specific tasks, all we need to do is provide that producer with some 

templates to use when matching object references to tasks from JavaSpace. So, when a producer 

starts up, it firsts advertises its capabilities, that is a list of byte code it knows how to execute. 

Each capability is actually a collection of three strings: one is naming the language the producer 

has runtime support for, one the processor of the machine it runs on and the third one the OS it 

runs under. After advertising its capabilities, the resource broker will create for each producer a 

list of templates and will send those templates back to it. Once it receives the first templates, the 

producer is ready to work and starts the task executor threads. The process is illustrated in the 

figure 4.5. If at a later time the resource broker decides to change the templates for a producer, 

that is re-tag it to run some other kind of tasks, all it needs to do is create a new list of templates in 

the form of a message, put them in a special message and send the message to the producer. Each 

producer has a special thread that waits for templates updates and when the message is received, it 

will automatically update the templates the producer runs with. 

Summing all this up, we can describe the components of a resource broker. They are presented 

in the figure 4.6. and are described next: 

The Scheduler Module 

The scheduler module is the most important module in the resource broker. It does the task of 

53


choosing producers that should run any new task submitted into the system and task producer to 

run new applications. 

The design of the scheduler permits having multiple scheduler implementation. Although, only 

one implementation can be used at a time. To implement new scheduler, the ALiCE programmer 

should just extend the Scheduler class. This class consists of two methods and a running thread. 

The methods are the calls used to schedule a new application/thread, that is to return a string 

containing the IP address or host name of the machine that was delegated to run the application/task 

referred by the object reference passed as a parameter. The scheduler that will be used is chosen at 

the start of the ALiCE instance for the resource broker. 

The scheduler module consists of a thread that is waiting for an object reference for a new 

application/a new task to appear in JavaSpace. When such a reference is retrieved, one of the two 

scheduling function is called with the current reference as a parameter, using the instance of the 

current chosen scheduler; which of the two scheduling functions is calls depend on the reference 

being an application reference or a task-to-schedule reference. 

In the process of delegating new applications to task producers and new tasks to producers, 

the scheduler module needs to keep am information repository to keep track of the producers into 

the system. This will be used for accounting in future development but is used for now just for 

scheduling. All this information are kept in the Information System Database. 

The Producer Tagging Module 

The purpose of this module is to send templates and templates updates to the producer. When a 

producer comes on-line, it advertise its capabilities and the role of the resource broker is to send 

the templates to work with back to the producer. This task is accomplished by the producer tagging 

module, that has the implementation as the run method in the running scheduler. Since tagging/re- 

tagging the producers depends on the scheduler chosen, the producer tagging module is entirely 

implemented in each scheduler implementation. 

If the scheduler decides to change the templates for a particular producer, all it needs to do is 

send an appropriate message to that producer with a list of updated templates. The new templates 

will replace the old ones in that producer and will be used as soon as they are received. The 

changing of templates can be as a result of a change in the topology of the system (a producer 

joining/leaving) or as a result of a new application with a higher priority being submitted into the 

system. 

The Result Manager Module 

The result manager module is the module that stores the results of the applications that choose to 

have the results delivered back through the resource broker. 

54


When a new application is scheduled, if the result delivery mode is resource-broker-stored, the 

resource broker will also put a message in JavaSpace stating that all the results for that application 

should be sent back to it. Any producer that has a result ready will look for and read this message 

if the result delivery mode indicates that they should be sent to the resource broker and will send 

the result to that resource broker that has scheduled the application. 

The resource broker will download any such result and store the file references for all the results 

in an internal hash table of vectors, indexed with the application ID. 

When an application’s result collector later decides to retrieve results, this will be done via a 

request-reply approach. First, the result collector can find out from the resource broker how many 

results are ready for the application in runs for by sending a special request message to the resource 

broker that stores the results (the address of this resource broker can be founding by reading the 

message through which it advertised that it will hold the results for that application). Inside the 

resource broker, the result manager module thread will reply to any such request message with a 

reply message. 

After finding out how many results are ready, the result collector can get the results one by one 

by placing (for each of the results) a new result request message in JavaSpace designed for the 

resource broker storing the results. In reply to this request message, the result manager module 

from the resource broker will write an appropriate object reference in the space that points to the 

file locally stored by the result manager. 

The Information System Database 

For any scheduler other than eager scheduler, a database with all the producers available and with 

which of these are free. Since the only scheduler implemented at the time this report is written was 

eager scheduler, this component of the resource broker is not yet implemented. 

The information system database should also be used for monitoring and especially for ac- 

counting purposes. 

Implementing a Scheduler 

Implementing a scheduler for ALiCE means extending the Scheduler abstract class, that consists 

of two methods: 

public abstract class Scheduler extends Thread { 

} 

public abstract String scheduleTaskGenerator(ObjectReference code); 

public abstract String scheduleTask(ObjectReference taskToSched) ; 

55

Eager 

Scheduler 

Round-Robin 

Scheduler 

Any Other 

Scheduler 

Information 

System 

Database 

TaskToSchedule/ 

Application 

reference 


Scheduller 

Module 

Task/ 

Code 

reference 

Resource 

Broker 

Producer 

Tagging 

Module 

Capabilities 

List 

message 

Templates 

List 

message 

JavaSpace/ 

GigaSpace 

Figure 4.6.: Resource Broker components 

Result 

Manager 

Module 

Result reference/ 

NewResult/ 

ResultsNo 

request message 

Result reference/ 

ResultsNo 

reply message 

The first method, scheduleTaskGenerator(), is called by the broker whenever a new application 

reference is retrieved through space. The call should return, in the form of a string, the IP address 

or the host name of the task producer chosen to run the task generator for the new application. The 

second method, scheduleTask(), should return the address/name of a producer that was delegated 

to run any new task to schedule that was taken by the broker from the space. These methods should 

make use of an information system that should also be implemented as a part of the scheduler. 

The scheduler should also implement the public void run() method from java.lang.Thread. The 

thread will be started by the broker and it is the one that should send the new templates messages to 

any new producer, as a reply to it sending a message with the list of its capabilities. The scheduler 

can than make changer to the templates a particular producer uses by sending a new message to 

update its templates. 

Example: the Eager Scheduler 

The only scheduler implemented at the time of writing this report is the eager scheduler. It is 

the simplest approach possible to scheduling, as it does not do any tagging, instead the templates 

sent to each producer are just instructing it to run any task/task generator it knows how to run by 

creating the most general templates and sending them to it. 

56


When a new application or task is scheduled, there is no particular destination returned, but 

rather a special wild card string from the ObjectReference class that will match any producer’s 

address. So, all the producers will run both task and task generators and they will begin running a 

new task/task generator as soon as they are free. 

We present next the source code for this scheduler implementation, since it is very basic and 

can be used as a hint of what one would need in order to implement a new scheduler: 

package alice.broker; 

import alice.onta.common.¡ ; 

import alice.runtime.¡ ; 

import alice.ALICE; 

public class EagerScheduler extends Scheduler { 

public String scheduleTaskGenerator(ObjectReference code) { 

} 

return ObjectReference.toAnyone; 

public String scheduleTask(ObjectReference taskToSched) { 

} 

return ObjectReference.toAnyone; 

public void run () { 

while (true) { 

Message rpmTemplate = new Message(null, null, null, 

Message.TYPE_REGISTER_PRODUCER); 

RegisterProducerMessage msg = (RegisterProducerMessage); 

ALICE.getRemoteObjectLoader().takeMessage(rpmTemplate); 

ProducerTemplateMessage ptm = new ProducerTemplateMessage(ALICE.myIP(), msg.source); 

ObjectReference templ; 

for (int i=0; i¢ msg.cap.size(); i++) { 

Capability c = (Capability)(msg.cap.elementAt(i)); 

templ = new ObjectReference(null, ObjectReference.TYPE_CODE, c.language); 

templ.proc = c.processor; templ.os = c.os; 

ptm.addTemplate(templ); 

templ = new ObjectReference(null, ObjectReference.TYPE_TASK, c.language); 

templ.proc = c.processor; templ.os = c.os; 

ptm.addTemplate(templ); 

57

} 

} 

} 

} 


ALICE.getObjectRepository().sendMessage(ptm); 

4.3.4. The Producer and the Task Producer 

The producers are the components of ALiCE that are executing the user code for the applications 

submitted into the system. There are two kinds of producers: the task producers and the actual 

producers. 

The task producers are the machine that are executing task generators. This machines are 

(but must not be) in the central node of ALiCE, under the same control as the resource broker. 

In the previous version of ALiCE, the task generators were executed on the same machines that 

the resource broker was running. The current version has separated the machine that runs a task 

generator from the machine that runs the resource broker for three main purposes: 

non-Java support - In order to support execution of applications that are written in a lan- 

£ 

guage other than Java, we deal with problems like platform and OS requirements, as other 

£ 

languages do not run in a virtual machine as Java does; so it could not be possible to run 

the task generators for the non-Java applications all on the same machine that the resource 

broker is running on; 

security - The resource broker is the central point of ALiCE, where scheduling takes place; 

it is the only component of ALiCE that if fails, will make the whole system fail. To run 

task generators on the same machine as the resource broker would mean to execute alien 

application code on that machine, possible malicious code. Even with strong code safety 

and isolation techniques, the system could not be entirely safe. The best approach is not to 

run any alien code on the resource broker’s machine; 

performance - Since the resource broker is the centralized point of ALiCE, it makes a bottle- 

£ 

neck if it does not perform scheduling very fast and if it does not responds to a large number 

of requests to schedule tasks and applications in a short period of time. To further load 

the computer broker by running task generators on it would mean additional performance 

decrease. 

The difference between the producers and the task producers are made only by the templates that 

they are using to retrieve references from the space and since those templates are given to the 

58


producers by the resource broker (see section 3.3.3), the code for the producer and for the task 

producer is exactly the same. The difference between the two is made by the resource broker 

which will decide to nominate a producer to run tasks or task generators or both. 

The task producer should be under the same control as the resource broker, since it is a vital 

point of the system. Given the architecture implemented by ALiCE, we can not afford to have 

a task producer go off line and back on line any time. This is because, until all the tasks are 

downloaded by producers, the task producer should be up and running to be able to deliver the 

files containing the serialized tasks on request by a producer. 

The structure of a producer and its components are presented in figure 4.7. The components 

are explained next. 

RTS 

Update 

Thread 

Java 

Execution 

Thread 1 

Templates 

Update 

Thread 

Java 

Execution 

Thread 2 

Producer 

RTS 

Registry 

RTS 

Manager 

JavaRuntimeSupport CRuntimeSupport 

Java 

Execution 

Thread n 

C 

Execution 

Thread 1 

C 

Execution 

Thread 2 

Figure 4.7.: Producer/Task producer components 

C 

Execution 

Thread n 

The RTS Manager RTS stands for runtime support. ALiCE is designed in such a way that 

support for new languages can be added in very easily; the aim is to be able to do it on-the-fly, 

like with the protocols in ONTA, without having to even restart the system. The RTS manager is 

handling the comportment, initialization and start-up of a producer. It first initializes all the runtime 

supports that are build-in the system (for now, this is the case of Java and C) by instantiating, for 

each of them, the class that extends the RuntimeSupport class for that language. 

59


Than, it creates a list of the capabilities of all the runtime supports and advertises the list 

through a message in JavaSpace. A resource broker will take that message and deliver the pro- 

ducer a list of templates. The templates will be later used by the execution threads to take object 

references from the space. The RTS manager will not start the execution threads until it does not 

obtain a list of initial templates. 

The RTS manager will start the templates update thread than and a number of execution threads 

for each runtime support implemented. 

The RTS Update Thread This thread is implemented in the RTSManager class and has the 

purpose of receiving new runtime supports that are sent by other instances of ALiCE through 

object references in JavaSpace. Though this is not completely implemented and functional yet, 

the final purpose is to make an extremely flexible interface to the runtime support system that will 

permit adding support for new languages in the form of plug-ins. This thread will get the new 

runtime support classes advertised through object references in JavaSpace by some other machine 

in ALiCE, dynamically load it, instantiate it and start it. The runtime supports will be stored in the 

RTS registry. 

The RTS Registry The runtime support registry has two purposes. The first purpose is to store 

the templates that are used by all the execution threads to retrieve object references to task/task 

generators from JavaSpace. This templates are updated whenever a new list of templates is received 

by the producer. The update is initiated and handled by the templates update thread. 

Although the templates are centrally stored in the RTS registry, it would be a bottleneck for all 

the execution thread to just hold references to them and use the object in the registry, as we would 

need synchronized access to this object and this would mean that only one execution thread would 

be able to read the templates at a time. For this reason, each runtime support for each language 

will keep its own copy of the templates that are obtained by cloning the list in the RTS registry. 

These clones are also updated whenever a new templates list is received. 

The second purpose of the RTS registry is to keep a list of the runtime supports currently avail- 

able to the instance of ALiCE it is running under. All the runtime supports received dynamically 

will be registered to the RTS registry. Also, the RTS registry is the one actually starting each 

runtime support after it is registered. 

The Templates Update Thread As explained in section 3.3.3 about scheduling, the approach 

we took in scheduling is to tag the producers to take some category of object references from 

JavaSpace for execution of the referred objects. We need to be able to change the tag of a producer 

60


at any time, by changing the templates it uses. For that purpose the RTS manager initiates the 

templates update thread and starts it at the system start-up on any producer. 

The templates update thread will wait to receive via JavaSpace any message destined to it that 

contains a list of new templates. As soon as such a message is received, the new templates will 

replace the old ones and clone copies of the list of new templates will replace the old templates for 

each runtime support implemented. 

The general algorithm of a producer is presented in figure 4.9. 

The Runtime Supports Each runtime support each actually just a class implementing the 

class java.lang.Thread. The RTS manager will receive as a parameter at start-up of an ALiCE 

producer the number of execution thread for each language supported. This will be the number of 

execution threads of each language started by each runtime support. 

Usually we need more than one thread for execution of task/task generators per producer. To 

see way, let’s take an example of a task producer that would have only one thread to run, let’s say, 

Java task generators. If it start executing a task generator that blocks waiting for an external event 

that will take a very long time to happen or if the task generator just loops forever, the task producer 

would be blocked; it would not be able to run anything, even though it is not doing anything useful 

either. By having more threads in a producer that are executing the mobile code, we provide more 

efficiency and more performance. By making the number of threads variable and specifiable at 

producer start-up, we are making the producer very flexible, so it can adapt to the limitations of 

the machine it runs on. 

Each runtime support contains as a field the list of its capabilities; a capability is actually a 

class consisting of three strings (language, processor and OS) that defines a kind of code that 

this runtime supports knows how to run. The list of capabilities for the whole producer that are 

advertised by the RTS manager is formed by gathering all the capabilities from all the runtime 

supports instantiated. 

The Execution Threads As mentioned, for each language supported, a number of execution 

threads will be started. What each of this thread does is very simple. It traverses the list of 

templates of the producer that were sent to it by a resource broker. For each template, if it matches 

each capability from the list of the runtime support it belongs to, than it tries to retrieve an object 

reference that does match that template from JavaSpace, waiting for it for a very short period of 

time. 

If such an object reference is found, it is taken from the space and the object it refers is down- 

loaded and dynamically loaded. First some initializations are done on the object dynamically 

loaded (like tagging it to the application it belongs to). Depends if it is a task or a task generator, 

61


a different method is called on the new instance by java reflect API. This is the entry point in the 

mobile alien code. 

If the thread executed a task generator, it will just returned. But if it executed a task, a result is 

collected as returned by the execute() method from the thread as a java Object. The result is than 

advertised in the space to the resource broker that scheduled the application or to the consumer that 

is running the result collector for the application, depending on the result delivery mode chosen. 

The Producer GUI 

Figure 4.8.: The Producer GUI 

The Producer GUI, presented in figure 4.8, provides the user with an easy to use acces to 

producer informations and for observing system output as. 

4.3.5. The Data Server 

The data server is a new component added to ALiCE. In the old version, data was transfered 

packed inside objects or the files should have been shared via NFS or even present, a copy on each 

producer machine. This is unacceptable and is not feasible for a grid computing system. 

62


The architecture of the new ALiCE version has a special part dedicated to data files. There 

exists a component in ALiCE that handles these files, the data server. 

The data server can be run on a separate machine (let’s say a machine with vast secondary 

storage space) or can be run on the same machine as the resource broker, which is more usual. 

The data server consists of two components: the data files manager and the server itself. 

Data Server Issues In a grid computing system, where we are dealing with applications that 

are compute-intensive, the amount of data used is often large and the size of data files just as large. 

In this case the main problem that arises is how to transfer a large file over a non-reliable network 

connection as the Internet. The approach we took is to store the data files at a central location and 

than provide the application developer with the means to read or write chunks of that file. 

This approach means that the user can choose the amount of data that is transfered in a single 

burst, setting the right trade-off between the overhead imposed by initiating multiple transfers to 

get small amount of data each time and the unreliable nature of the connections. The user might 

choose even to read the whole file in one go, making the chunk as large as the file size. 

Another benefit of our approach is that not the whole data file is distributed to all the producers 

that need it. Instead, the application can partition the data for the problem at first, decide what 

chunk of data each task needs and than each task will just retrieve the data it needs. This impose 

the lowest overhead possible caused by data transfer and adds flexibility. 

The Data Files Manager The data files manager is the component of the system that gets new 

advertisements of data files from consumers and than downloads them. It consists of a thread that 

takes the object references form JavaSpace, downloads the file and than stores it locally. The data 

files will be stored in a directory that has the same name as the application ID of the application 

the data files were submitted for. Since the application IDs are unique, we make sure this way that 

there will be no conflicts if different applications will have data files with the same file name. 

When the data files manager starts up, it places a message in the space that states that a data 

server is present on the machine that it runs on. Every time a consumer will submit a data file, it 

will first look for such a message to know to who to send a data file. 

After each data file is downloaded, the data files manager puts a message in the space to adver- 

tise the fact that it stores that data file for the application it was submitted for. When a task or task 

generator needs later access to that file, it will look for a message from the data server that holds 

the file, construct a new instance of the DataFile class that points to that location, all through use 

of some static calls form the Data class. The file reference obtained this way will be used to read 

and write from/to that file by the user code. For more details on data files usage, see section 4.2.4. 

63


The Server The server is a simple implementation of raw TCP/IP sockets transfer. It listens for 

connections on a port. A connection will be initiated by method calls from a DataFile instance 

from inside some user code. Since the data file object is already initialized and knows the location 

of the data file (that is, the data server address and the full filename), the request will open the 

connection to the data server machine on the preset TCP port and just send the file name as a 

request, followed by the request type and the parameters for the operation (chunk size, offset in 

file etc). The server will respond by sending or receiving the required chunk raw, through the 

opened connection. 

64

NO 

Task 

generator 

Execute task generator 


Start 

Get the list of capabilities 

from each runtime support 

Advertise a message containing 

the list of all capabilities 

Wait for the list of ininitial 

templates from a resource broker 

Start execution threads 

New templates 

received? 

NO 

New object reference 

in JavaSpace 

matching a template? 

YES 

Download file containing 

the serialized object 

Initialize new object 

Is it a task or 

a task generator? 

direct 

delivery 

Send the result to the resource 

broker that scheduled the application 

YES 

Task 

Update templates list 

Execute task 

Result delivery 

mode 

Figure 4.9.: The Producer Algorithm 

65 

resource-broker-stored 

delivery 

Return the result to the consumer 

that submited the application

Part III. 

Sample Applications and Performance 

Testing 

66

5. Example of ALiCE Applications 

5.1. Matrix Multiplication 

The matrix multiplication is an application that takes in an integer n, and computes the result of the 

multiplication of two nxn matrices. The two nxn matrices are random generated by the application 

itself. 

The matrix multiplication application is designed such that the number of tasks generated for 

every application execution is the exactly n, where n is the parameter above. The integer n repre- 

sents the problem size as well as the task size. (A problem consists of many tasks). The purpose 

of this design is such that with the increase of n, both the overall problem-size of the application 

increases and the task-size of the application increases. This property is desirable since we wish to 

test the performance of ALiCE under conditions of varying task sizes. The experiments and results 

can be observed in Chapter 5. 

Next, we present the algorithm for the Task Generator and the Result Collector components of 

the ALiCE Matrix Multiplication application. 

ALGORITHM ALiCE_MM( n ) 

TASK_GENERATOR 

1: A new Matrix of size nxn 

2: B new Matrix of size nxn 

3: Initialize (A) 

4: Initialize (B) 

5: for x in 1 to n 

6: T new TASK containing (row x of A, ma- 

trix B, and x) 

7: send T to Resource Broker 

8: endfor 

67

RESULT_COLLECTOR 


1: C new Matrix of size nxn 

2: for j in 1 to n 

3: RESULT R incoming Result from Resource Broker 

4: C[R.x] = R.result_array 

5: endfor 

TASK_EXECUTE (Ax, B, x) 

1: n A.length 

2: result_array = new array of n elements 

3: for j in 1 to n 

4: for m in 1 to n 

5: result_array[j] += Ax[m] *B[j][m] 

6: endfor 

7: endfor 

8: RESULT R new Result 

9: insert result_array into R 

10: return R 

5.2. Ray Tracing 

Ray tracing is a method for producing views of a virtual three-dimensional scene on a computer. 

It tries to mimic actual physical effects associated with the propagation of light. A ray tracer 

calculates pixel values of a given coordinate in an image by tracing a path of light as it bounces 

off or is refracted through surfaces. This involves a lot of calculations that may take too long for 

one machine to handle; hence, grid computing environments such as ALiCE is used to speed up 

the process of rendering ray traced images. 

The ray tracing application that we develop takes in two parameters m and n, and divides the 

task of rendering a 1024x768 sized image into several chunks of size m x n each. These chunks 

are then calculated separately and the results of the calculation of these chunks will be displayed 

on the screen. 

Unlike the Matrix Multiplication application, the problem size of the Ray Tracing application 

is fixed. Changing the parameters of m and n would only vary the task size, and hence the number 

of tasks that the problem has. 

Next we present the overall algorithm for the ALiCE ray tracing sample application. 

68

ALGORITHM ALiCE_TRACE(m, n) 



Figure 5.1.: ALiCE Ray Tracing Visualizer 

1: for x in 0 to 480 step n 

2: for y in 0 to 640 step m 

3: TASK T new Task containing data of the rectan- 

gle (x,y) to (x+n-1, y+m-1) and the (x,y) coordinate 

4: send T to Resource Broker 

5: endfor 

6: endfor 


1: for x in 0 to 480 step n 

2: for y in 0 to 640 step m 

3: Result R collectResult from Resource Broker 

4: display R on screen 

5: endfor 

6: endfor 

TASK_EXECUTE 

1: compute rendering output 

69

2: Result R new Result 


3: Insert rendering output into Result 

4: return Result 

A screenshot of the ALiCE RayTrace visualizer while the application is still working is presented 

in Figure 5.1. 

5.3. DES Key Cracker 

Figure 5.2.: ALiCE DES key cracker 

The DES key craker is a massively parallel application that is trying to find a DES key by brute 

force, doing an exhaustive saerch in the key space. Each task will be given an interval of keys to 

look for the sollution in. 

The application we develop takes in as paramaters the length k of the key in bits (this means 

the key search space actually) and the size of each task t, given in the terms of how many keys will 

each task search. The number of tasks will be decided by the raport between the maximum key 

possible and the task size. 

This application is very usefull to see how ALiCE is scalling up, given the fact that we can 

vary the problem size, as well as each task side. 

70


The general algorithm for the DES Key Cracker application is presented next and a screenshot 

of the viewer GUI is presented in Figure 5.2. 

ALGORITHM ALiCE_DES_CRACKER(k, t) 


1: Generate random key to look for 

2: Encrypt a preset short message using the gener- 

ated key 

3: for i in 0 to 2^k step 2^k/t do 

4: T new TASK search- 

ing from i*t to (i+1)*t, given the encrypted and the un- 

encrypted messages generated in step 2 

5: send T to the Resource Broker 

6: endfor 


1: for i in 0 to 2^k/t do 

2: if key found, display result 

3: endfor 

TASK_EXECUTE 

1: Result R new Result 

2: for x in startKey to endKey do 

3: Encrypt the message received using the key x 

4: If the result of the encryp- 

tion is equal to the encrypted message re- 

ceived, the key has been found, it is x; put x into R 

5: endfor 

6: if key not found, return NOT_FOUND_RESULT 

7: else return R 

5.4. Protein Matching 

The protein matching application is a complex application written for ALiCE by Yew Kwong NG, 

ngyewkwo@comp.nus.edu.sg. 

Bioinformatic applications, which usually involve massively huge volumes of chromosome 

data stored in geographically distributed databases, are potentials for execution on a grid comput- 

ing platform. The sequence comparison approach adopted here is the Smith-Waterman dynamic 

71


Figure 5.3.: Protein Matching for ALiCE application visualizer 

programming algorithm. The objective of this toolkit is to allow a user, typically a computa- 

tional biologist, to obtain the most optimal alignments of specific query sequences and each gene 

sequence stored in known databases, which would otherwise be extremely tedious if performed 

manually. The scale of the problem can be very large, since we are considering computations 

involving possibly tens of thousands of gene sequences scattered in different nodes on the web. 

This application is just presented as an example as a very practical and complex application 

developed for ALiCE. It is not used in the performance testings, but is rather an illustration of the 

power of ALiCE. 

A screenshot of the visualizer after collecting the results is presented in Figure 5.3. 

72

6. Performance Testing 

Based on a set of tests we carried out, this chapter presents our preliminary performance results. 

Aside from the experiments presented here, we carried out a number of other tests like stress tests 

on JavaSpace and GigaSpace (and we concluded that we should use GigaSpace :-) ) and tests 

outside the actual test enviroment. 

Tests also included trying ALiCE on Windows and Solaris machines, as well as on other 

netwroks that the cluster enviroment that we used as the main testbed for testing. 

6.1. The Test Bed 

Out experiments were carried out on a cluster of twenty-four nodes shown in figure 6.1. 

Sixteen (named ws00 to ws15) are Intel PII 400MHz with 256MB of RAM, and eight (named 

ws17 to ws24) are Intel PIII 866MHz with 256MB of RAM. These nodes are connected to each 

other via a 100Mbps switch. All nodes are running RedHat Linux release 7.0 (Guiness) distribu- 

tion, based on the 2.2.16-22 Linux Kernel. 

ALiCE is developed using the Java TM 2 Software Development Kit version 1.3.1 and 1.4.0 

and the Jini Starter Kit version 1.2. We used GigaSpace TM Platform 2.0 in our actual stage 

of development and testing. For development and experiments, we use the Java TM 2 Runtime 

Environment Standard Edition with the Java TM HotSpot Server and Client Virtual Machines build 

1.3.1_03-b03, mixed mode. The HotSpot Server Virtual Machines are used for Resource Broker 

and Producer nodes. The Consumer nodes make use of the HotSpot Client Virtual Machine. 

During our test the need to have a machine with more memory to hold GigaSpace arised, so 

we moved 128 MB of RAM from ws04 to ws21, which held GigaSpace for all our experiments. 

We also tried to run GigaSpace on a Sun Ultra 30 station, based on an UltraSparc II 266Mhz, 

but this machine proved to be much to slow in terms of processor power than needed. The conclu- 

sion is that if GigaSpace is held on only one machine, that machine should be a powerfull machine 

with lots of memory. 

73


ws00 ws01 ws02 

ws15 

Ethernet 

switch 

ws17 

6.2. Experiments 

ws18 

..... 

..... 

Figure 6.1.: Cluster-based experiment enviroment 

ws24 

We have run a vast series of exeperiments to see how ALiCE performs in different conditions, in 

diferent enviroments and with different work loads. In this section we present the most important 

tests conducdet, as well as the conclusions and the relevant results to these tests. 

6.2.1. Performance Evolution with Variance of Task Size 

Objectives: To observe the overhead impose by varying the task size for a problem of a fixed 

total size. 

74

Platform: 

£ GigaSpace on ws21 

£ Resource Broker on ws10 

£ Task Producer on ws20 


£ 5 Producers on ws17, ws18, ws19, ws22 and ws23 

£ Consumer on ws24 

Test Application: Ray Tracing 

Methodology: We run the Ray Tracing application for an image of 1024x768 pixels (fixed prob- 

lem size) on a set of 5 producers. We varied the size of each task to see the influence of the 

number of tasks over the application execution time. 

Test Results: 

Chunk Size Tasks Time (seconds) 

20x20 1976 1230 

30x30 875 285 

40x40 494 270 

60x60 234 117 

80x80 130 115 

100x100 88 112 

140x140 48 107 

180x180 30 115 

200x200 24 120 

250x250 15 141 

300x300 12 135 

350x350 9 158 

400x400 6 163 

75

Graphical representation of results: 


Analysis of results: What we observed confirms our intuition that when the task size increase 

(so the task number decrease), the overhead will decrease. So, by having less tasks, the 

network/JavaSpace overhead decrease, so does the exectuion time, to a certain point. After 

that point, the execution time begins to increase slightly again. 

This is caused by two factors. One is that the number of tasks is not always a multiple of 

the number of producers, so there are times (to the end of the application runtime) that some 

producers are idle during the life time of the application. The other factor is that not all the 

task are of equal size in terms of computational time. Even if the cunks are equal, some tasks 

will require more computation than others. If the tasks more computation-intensive will be 

taken last (we are using eager scheduling), than again there will be some producers that will 

remai idle. This explains the anomaly of increase in execution time to the end of the scale of 

tasks number. 

6.2.2. Varying the Number of Producers 

Objectives: To explore the possible speedup that can be obtained by increasing more producers 

and to understand the factors that might limit the speedup. 

76

Platform: 



£ Producers on ws04-ws23 


Test Application: Ray Tracing 


Methodology: We measure the application runtime of ALiCE with several producer configura- 

tions using the normal eager scheduling algorithm to execute the Ray Tracing test application 

of the same task size (140x140 chunks for an 1024x768 image, that sums up to a number 

of 48 tasks). We compare the speedup obtained via the use of ALiCE with reference to the 

sequential runtimes of the Ray Tracing test application. 

Test Results: 

Producers Time (seconds) 

1 491 

2 246 

3 177 

4 132 

5 111 

6 101 

7 98 

8 96 

9 93 

10 78 

11 68 

12 67 

13 66 

14 59 

15 65 

16 63 

17 63 

18 61 

77

Graphical Representation of Results: 


Analysis of results: Our experiments have shown that ALiCE does improve the execution time 

of the ray tracing application by a significant factor. The best performance improvement 

would be with 14 producers (improvement by 88%). 

The fact that at some point increasing the number of producers lead to increasing the total 

time of execution can be explained by the fact that the machine running GigaSpace ran out of 

phisical memory and it started using the swap, which lead in turn to an accelerated decrease 

of performance. 

6.2.3. Overhead Variation with Task Size for Direct Result Delivery 

Objectives: To explore the percent from the total execution time that was represented by over- 

head with variation in task size, maintaining a fixed total problem size. In this test the 

overhead is measured in the case the result delivery mode is direct delivery, i.e. the results 

are delivered directly from the producers to the consumer. 

78

Platform: 




£ Producers on ws17, ws18, ws19 and ws20 


Test Application: DES Key Cracker 


Methodology: We measure the application runtime of ALiCE with several task size configura- 

tions using the normal eager scheduling algorithm to execute the DES Key Cracking test 

application on the same key length, 25 bits, using a fixed number of 4 producers. 

The result delivery mode was direct delivery, so the results came back directly from the 

producers to the consumers. 

Test Results: In the next table we present the results for the measurement made during this tests 

for: 

TpT - Average time of execution per task, including the overhead; 

TpT-O - Avera time of execution per task, excluding the overhead; 

T - Total execution time; 

CT - Computational time from total execution time; 

OT - Overhead dime from total execution time; 

O (%) - The percent that the overhead represented from the execution time. 

79


Tasks TpT - O TpT T CT OT O(%) 

4 69.69 73.6 73.6 69.69 3.92 5.3 

8 34.96 38.64 77.28 69.91 7.37 9.5 

20 14.02 14.93 74.67 70.1 4.57 6.1 

40 7.07 7.78 77.82 70.72 7.1 9.1 

80 3.62 3.88 77.58 72.34 5.24 6.8 

100 2.78 3.04 76.03 69.68 6.35 8.4 

140 2.04 2.12 77.56 71.3 6.27 8.1 

200 1.45 1.59 79.45 72.5 7 8.8 

300 1.016 1.209 90.68 76.2 14.48 16 

400 0.807 1.008 100.8 80.7 20.1 19.9 

800 0.403 0.997 199.4 80.6 118.8 59.6 

1000 0.324 0.996 249 81 168 67.5 


The first graph presents the fraction of the execution time that is represented by the actual compu- 

tation and the one represented by overhead, ofr various task sizes. 

The second graph presents the increase of the percentage that the overhead represents in the 

total running time as the task size decreases and the numer of tasks increase. 

80


Analysis of Results: The conclusion of this test was, as expeceted, that the overhead increases 

as the number of tasks increases and the size of each task decreases. The more interesting 

part was the trend of the increase in overhead. To a certain point, the overhead increases very 

slowly with the increase of tasks number. But after that point is reached, the percent that the 

overhead represents from the total execution time increases exponentialy. 

All these tests lead us to the conclusion that the choice of the task size and the partitioning 

of the problem are extremely important in how an application performs on ALiCE. For this 

reason, the programmers should be more concern ed with problem partitioning issues. 

6.2.4. Overhead Variation with Task Size for Delivery of Results 

Through Resource Broker 

Objectives: To explore the percent from the total execution time that was represented by over- 

Platform: 

head with variation in task size, maintaining a fixed total problem size. In this test the 

overhead is measured in the case the results are delivered through the resource broker. 



81


£ Producers on ws17, ws18, ws19 and ws20 


Test Application: DES Key Cracker 


Methodology: The same as the previous test (section 5.2.3), but with the result delivery mode 

set so the results are delivered to the consumer through the resource broker. 

Test Results: 

In the next table we present the results for the measurement made during this tests for: 

TpT - Average time of execution per task, including the overhead; 

TpT-O - Avera time of execution per task, excluding the overhead; 

T - Total execution time; 

CT - Computational time from total execution time; 

OT - Overhead dime from total execution time; 

O (%) - The percent that the overhead represented from the execution time. 

Tasks TpT - O TpT T CT OT O(%) 

4 67.97 77.86 77.86 67.97 9.86 12.7 

8 34.26 41.74 83.48 68.52 14.96 17.9 

20 13.77 14.36 71.8 63.85 7.95 11.1 

40 6.88 7.66 76.6 68.8 7.8 10.2 

80 3.47 4.72 94.4 69.4 25 26.5 

100 2.81 4.96 124 70.25 53.75 43.3 

140 2.065 4.538 158.83 72.28 86.55 54.5 

200 1.458 4.987 249.35 72.9 176.45 70.8 

300 1.016 5.083 381.23 76.2 305.03 80 

400 0.779 5.523 552.3 77.9 474.4 85.9 

800 0.425 5.276 1055.2 85 790.2 91.9 

1000 0.326 5.309 1327.2 81.5 1245.7 93.9 


The first graph presents the fraction of the execution time that is represented by the actual compu- 

tation and the one represented by overhead, ofr various task sizes. 

82


The second graph presents the increase of the percentage that the overhead represents in the 

total running time as the task size decreases and the numer of tasks increase. 

Comparison Between Result Delivery Modes: As expected, the overhead in the case of re- 

sults being delivered through the resource broker is bigger than in the case of direct result 

delivery. It is very important though to notice the fact that this overhead tends to increase 

83


much more rapidly when delivering the results through the resource broker. This is ex- 

plained first by the fact that the resource broker will act as a huge bottleneck when there 

are many results, as they will be delivered one by one, from the same machine (the resource 

broker), whilst in the case of direct delivery, the results are received from many machines 

and even more, there are many threads running at the result collector retrieving result objects 

in background. The comparison between the overheads is presented in the next figure: 

The conclusion of analyzing the overheads in the two cases is that the result delivery mode 

through the resource broker should be used only when it is absolutely needed, like is the case 

when the consumer can not stay on-line to wait for all the results and is coming back on-line at a 

later time to check for new results. 

6.2.5. Performance Comparison with the Old Version of ALiCE 

We did some performance comparison between the last version of ALiCE and the new implemen- 

tation. The results are comparison have shown that for very low load of the system, the older 

version did better. This can be easily explained by the fact that the old version’s scalability was 

very poor and all the objects were kept in memory and the classes were transfered using RMI and 

the codebase property. This is faster, but for a higher load, the system would not only tend to be 

very slow (as the memory would run out and objects would go into swap), but it would crash after 

84


a certain load was reached. For a grid computing system such a behavior is totally unacceptable, 

as we are dealing with extremely high loads in a grid system used in real life. The new version is 

extremely scalable, as no objects, nor classes are kept in memory, but rather on secondary storage, 

as serialized files. That is why, for high loads, even with one application running, the new system 

did better than the old version. 

To conclude this comparison, we can say that the new version is a truly scalable system that has 

an increase in execution time that varies linearly with the system load. Also, since no objects/user 

classes are kept in memory, this means that the system can be loaded to the only limit of the size 

occupied by references in JavaSpace. Since the footprint of a reference in JavaSpace is very small, 

ALiCE would not crash even when faced with numbers like thousands of nodes in the system 

working at the same time. 

85

Part IV. 

ALiCE GRID Programming Model 

86

7. Developing ALiCE Applications 

7.1. The Model 

The ALiCE Programming Template for Java application follows the same model as in the first 

version, that is a model called Task Generator - Result Collector model. The templates are mainly 

the same, with some differences and new features which are presented in this chapter. 

The model defines two entities for a parallel ALiCE application: 

£ Task Generator 

The task generator is the entity that will create and initialize the tasks. The tasks are the execution 

thread doing the computational-intensive work. 

£ ResultCollector 

The result collector is executed at the machine that submits the application and its role is to receive 

the results obtained from the execution of the tasks generated by the task generator. 

Essentially, the ALiCE program will work by the following steps: 

1. A new application, consisting at least of a task generator and a result collector, is submitted 

into the system. As an optional part, one or more data files can be submitted into the system 

as belonging to this application; these files will be further available to the tasks and to the 

task generator; 

2. The application is downloaded by a resource broker that finds an appropriate machine to run 

the task generator and schedules the application to be run there; 

3. The application is downloaded by the selected machine and the task generator is dynami- 

cally loaded, instantiated and run, thus tasks are created and after each of them is created, a 

reference to it is sent to the resource broker in order to schedule it; 

87


4. The resource broker schedules the tasks and sends the references to the designated machines 

via JavaSpace; 

5. The producer machine downloads a task, runs it and then the result object obtained is sent 

back either to the resource broker, or to the consumer that has originally submitted the ap- 

plication, the choice of which of this two is done being made when the application was first 

submitted. 

7.2. Template Features 

The ALiCE programming template for Java applications includes the following specifications: 

TaskGenerator, ResultCollector and Task classes, that should be extended by the program- 

£ 

mer; 

Methods used to send new Task objects to ALiCE, either from the TaskGenerator, of from 

£ 

another Task; 

£ Methods used to retrieve the Results from the system; 

£ Methods used submit, to get a reference to a data file, to read and write to it; 

£ Methods of simple communication between the result collector and the task generator; 

£ Methods of generic communication between tasks or between tasks and task generator. 

7.2.1. The Task Generator Template 

As mentioned above, the task generator is the entry point in the application. The TaskGenerator 

template mainly requires the programmer to create a class that is executed at the task producer’s 

site and that generates, initializes and submits computational tasks. There should be no compu- 

tational part in the task generator (this is not enforced to be so), all it should be done are some 

initializations and maybe some message exchange with the result collector. The actual entry point 

in the task generator is provided by the method public void main(String[] args). This method 

should be implemented by any task generator class. We stress here that in the task generator there 

should be NO static methods, including the main(String[] args) method. This is because some in- 

formations concerning the application that this task generator belongs are stored inside the actual 

object instance of the task generator that is created by dynamically loading the class submitted by 

the programmer. So, do not make the main method in the task generator static, nor create new task 

88


generator instances. Beside this entry point, before calling the main() method on the task genera- 

tor, the init() method is called. So the programmer can implement the method public void init() is 

there are any necessary initialization to be done before starting the application, those initializations 

can be done inside this call, since it is called on the task generator object before anything else. 

The template also requires the programmer to extend the alice.consumer.TaskGenerator class. 

This superclass has the means to send computational task into the system, to receive string mes- 

sages from the result collector and to send/receive objects to/from any other component of the 

application (result collector or tasks). Also, if desired, the user can get a reference to a data file 

and use it to read/write from/to that file. For more details on data files usage, see section 4.2.4. 

The method calls the programmer has to his/hers disposal from the Task Generator are: 

public void process(Task t) - This is the call that submits new tasks into the systems to 

£ 

be produced (executed); since the purpose of the task generator is obviously to generate 

tasks, this is the most important call inside this component. The programmer should create 

objects that are implementing the Task class and than call process() with that objects to 

start a computational process on a producer machine, that is to send the task to ALiCE for 

execution; 

public String getStringMessage() - Sometimes it is useful to have a way to send some simple 

£ 

messages from the result collector to the task generator, with minimum overhead. This is 

the case when the result collector (which is the only part of an application that is run at 

the consumer) has a user interface that reads some basic inputs and/or commands and this 

inputs/commands should be transmitted to the task generator. To receive a string sent by the 

result collector of an application, the task generator of that application can receive the string 

with this call. To transfer more complicated structures between parts of the system, use the 

requestObject/sendObject mechanism; 

public Object requestObject(String id) - This call is used to request and wait for the reception 

£ 

of an object defined by the given identifier. This object can be sent by any other component 

of the application, that is either the result collector, either a task. For more details about this 

mechanism, see section 4.2.5; 

public void sendObject(Object obj, String id) - This is the corresponding call to send an 

£ 

object defined by the given id to the components of this application. 

To better understand this mechanism, take a look at the examples presented in section 4.3 of this 

part and also at the model programing templates in part 5. 

89

7.2.2. The Task Template 


The Task template is a class that any task submitted by the programmer should extend. The exten- 

der class should essentially implement the public Object execute() method that the producer will 

call on any tasks received to be executed. The returned object is the result of the task (also see 

section 4.2.3). 

Beside this, the template offers the means to communicate with other tasks and the means to 

create a new task. The programmer can get from inside a task a reference to a previously submitted 

data file and than use that reference to read or write chunks or data from/to that file. The calls 

available from inside a task to achieve this functions are: 

public void process(Task t) - This is the call that submits a new tasks into the systems to 

£ 

be produced (executed). This is very useful to develop application in which there are some 

major data dependencies or the ones in which the sizes of the tasks can not be previously 

calculated and they depend on some data being calculated in a task. Also this ability to create 

tasks from inside other tasks permits the implementation of computational algorithms from 

the “divide and conqueror” class, which opens up the ALiCE system to a full class of new 

problem solving. 

public Object requestObject(String id) - This call is used to request and wait for the reception 

£ 

of an object defined by the given identifier. This object can be sent by any other component 

of the application, that is either the result collector, either a task. For more details about this 

mechanism, see section 4.2.5. Using this inter-task communication system, any means of 

synchronization and data dependencies handling is possible, given the fact that the object 

requested/submitted can be of any kind (as long as it is serializable, i.e. implementing the 

java.io.Serializable interface); 


£ 

object defined by the given id to the components of this application, frequently this being 

another tasks to which there is a need of synchronization or a data dependency. 

7.2.3. The Result Collector Template 

Result Template 

The result can be any kind of an object; this means that the result is as generic as it can get, thus 

permitting the programmer to implement any data structure inside the results delivered back to the 

result collector. The only requirement is that it implements the java.io.Serializable interface, since 

it will be transported over the network. 

90


The delivery of results back to the consumer can be done in two ways. The first is directly, 

the second through the resource broker that has scheduled the application originally. The direct 

delivery is intended to be used in all the cases when the consumer is staying on-line and where 

the result collector is a thread that is running on this machine from the moment the application is 

submitted to the moment when all the results have been delivered, without going off-line. This kind 

of delivery impose much more overhead and hence it is much quicker. But there are cases when the 

consumer can not stay on-line (e.g. when it uses a dial-up Internet connection). In this cases, there 

is convenient to have the results delivered and stored at the resource broker’s site until the consumer 

will later come back on-line to restore them. In this case the result collector should be interrupted 

and executed later (or maybe put in a waiting state) until the connection is again available. The 

selection between the first and the second delivery mode is done when the application is first 

submitted into the system and the selection between the two is done automatically. 

The Result Collector 

The ResultCollector mainly requires the programmer to create a class that can be executed at the 

consumer. This implies that the programmer writes a class extends the alice.result.ResultCollector 

class. The main method of this class, that is the entry point of the result collector and which must 

be implemented by the programmer is the public void collect() method. 

The ResultCollector superclass has the means to obtain a result (if one is ready) for the ap- 

plication, to find out how many results are ready and also to send a simple string message to the 

task generator of this application, with the uses explained in section 4.2.1. This functions can be 

achieved through the following method calls from the class that extends ResultCollector: 

public Object collectResult() - If there is a result available, this call will return the object that 

£ 

represents that result, as returned by the execute() method from the task that has generated 

that result. If there is no result available, it will return null. To find out the number of results 

available (of if there are any), the programmer should use the following method; 

public int getResultsNoReady() - This call returns the number of results that are available 

£ 

and that have been calculated already by some tasks; 

public void sendStringMessage(String str) - This can be used to send a string message to the 

£ 

task generator of this application, to send some simple parameters or user-inputed options; 

public String getRunDir() - This method returns the full path directory name of the directory 

£ 

that the result collector runs in; this directory is chosen dynamically at the start of each 

application (see subsection 3.3.2 for more details). 

91

7.2.4. Data Files Usage 


Data files are submitted to a data server at the time a new application is submitted into the system. 

The model we implemented uses the random-access file paradigm; this means that from inside a 

task or a task generator, the programmer can get access to a file with a special open() call and than, 

using the reference returned, he/she can do random access reads or writes to that file. 

The first step in data file usage is handled by the system and consists in submitting a data file for 

an application to the data server. This is done at the same time the actual code for the application 

is submitted. From the point the file is transfered, it is available to the task generator and to all the 

tasks of the application. If an open call is called for a data file and that file has not been yet entirely 

transfered to the data server, the call will block until the transfer completes. There are two calls for 

opening a file, one for handling file opening from inside tasks and one from inside task generators. 

These two calls are static and are methods of the alice.data.Data class: 

£ 

£ 

static public DataFile openFile (String fileName, Task t) 

static public DataFile openFile (String fileName, TaskGenerator tg) 

To get a reference to a data file, the actual call from a task or from a task generator looks the same, 

having a form similar to: DataFile f = Data.openFile(name, this). The reference to the calling 

task/task generator is needed in order to link the data file name to the application that is using it, 

since the system handles file with the same names in different applications at the same time. The 

name that is passed to the openFile call should be the same as the one of the originally submitted 

data file, without the path, There is no directory allowed for data files at the data server. This means 

that if, for example, the data file originally submitted had the path ./data/files/DataFile1.dat at the 

site of the consumer, to get a reference to that file after submitting it to ALiCE as a data file for an 

application, the programmer should call Data.openFile(“DataFile1.dat”, this), without using any 

path. 

The class obtained from an openFile call is a reference to the data file which contains the 

following methods: 

public byte[] read (int offset, int length) - This call will read from the data file represented 

£ 

by the class it is called on, starting at the given offset in the file, a chunk of length bytes. 

Since the chunk size is at the decision of the programmer, the system is very flexible. In 

this way, the whole data file could be read at once (this should be the case only when the 

data file is small), or any part of it, of any size, can be read at once. Each task can thus only 

read the data that it needs and decide which part of the data file it needs based on some prior 

calculations done inside the same task; 

92


public void write ( byte[] buffer, int offset, int length) - This is the corresponding call to write 

£ 

data to a data file. The programmer can write any length of data. The data will be taken from 

the byte[] area referred by the buffer parameter. A number of length bytes will be read from 

the buffer and will be written to the data file starting at the given offset in the file. Sine the 

programmer has writing possibilities to the data file, the data files could be used as a mean 

of inter-task communication (although it is not intended to be this and nor we advice to use 

it this way); 

public long length() - This will just return the actual size of the physical data file as occupied 

£ 

on the data file server’s disk storage, in size of bytes. 

7.2.5. Inter-task Communication 

The design of the new version of the ALiCE system adds to the system the functionality needed to 

handle any kind of dependencies between tasks and provide the means of flexible and convenient 

communication between parts of the application. 

Communication Through User Objects 

To communicate between tasks or between the task generator and tasks, the programmer has the 

ability to use any kind of objects as the unit being transfered during the communication. The only 

requirement for the objects transfered is that they implement the java.io.Serializable interface. 

Each type of objects used will be associated with a string id to differentiate between classes of 

objects. The programmer can request from inside a task that another task sends an object of a class 

that is defined by such a given identifier. 

There is no addressing implemented in the inter-task communication, since on a grid com- 

puting system there is no previous knowledge of where each task will execute, so at the time the 

application is written, the programmer doesn’t know where each task will be located at the execu- 

tion time. The implementation of any kind of task identification hard-coded in the Task class from 

ALiCE would have meant greatly diminishes the flexibility of the model. The approach taken is the 

most general and flexible one, that is to give the programmer the ability to implement any means 

of communication. We aim to achieve this by using generic objects for inter-task communication. 

In this way, the programmer could, for example, identify a specific task with a type of object with 

a certain identifier. The methods that the programmer can use to send/receive objects to/from a 

task or task generator are: 

public Object requestObject(String id) - This call is used to request and wait for the recep- 

£ 

tion of an object defined by the given identifier. The call will block until such an object is 

93

eceived; 



£ 

object defined by the given id to the components of this application, frequently this being 

another tasks to which there is a need of synchronization or a data dependency 

All the user-object communication is done through JavaSpace references. The footprint in JavaS- 

pace is always as large as the reference, so it does not depend of the size of the object transfered. 

So the programmer does not need to worry about the size of the objects transfered between com- 

ponents of the application, the only limitation being imposed by the resources of the machines that 

are running the producers, not by the network overhead. 

Communication Through Data Files 

Although inter-task communication by means of data files is possible, it is not recommended. 

This kind of communication can be implemented by using a data file as the point to send/receive 

communication data and do reads and writes on that data file from different tasks. The main 

limitation of doing this is that there is no synchronization possible but by using busy waiting over 

the network, which is very slow, high-overhead and recommended. 

7.3. Simple application examples 

This section will present some extremely simple application code examples in order to get a glance 

of how to use the functionalities described above. 

7.3.1. Simple Example and Data File Usage 

The first example is generating a task and then, from inside that task, opens a data file, writes a 

string to a position in the file, reads back what was written and returns a result that contains this 

string. The string is then printed by the result collector. 

THE RESULT COLLECTOR 

import alice.result.¤ ; 

public class MyResultCollector extends ResultCollector { 

public void collect() { 

MyResult res = null; 

94

} 

} 

THE RESULT 

import java.io.¤ ; 


while (getResultsNoReady() ¥ 

1) ¦ 

; 

res = (MyResult)collectResult(); 

System.out.println("Strint returned: "+ res.str); 

public class MyResult implements Serializable { 

} 

public String str; 

public MyResult() { 

} 

str = null; 

THE TASK GENERATOR 

import alice.consumer.¤ ; 

public class MyTaskGenerator extends TaskGenerator { 

} 

public MyTaskGenerator() { 

} 

public void generateTasks() { 

} 

System.out.println("{MyTaskGenerator}: generating TASK"); 

Task t= new MyTask(); 

process(t); 

public void main(String args[]) { 

} 

THE TASK 

this.generateTasks(); 

95


import alice.data.¤ ; 


public class MyTask extends Task { 

} 

public MyTask () { 

} 

public Object execute () { 

} 


byte[] testW = (new String("OOPS")).getBytes(); 

System.out.println("{Task}: Executing task "+this.hashCode()); 

DataFile f = Data.openFile("Datafile",this); 

System.out.println("{Task}: data file length: "+f.length()); 

f.write(testW,10,4); 

byte[] testR = f.read(10,4); 

String rd = new String(testR); 

System.out.println("{Task}: RESULT OF READING: "+rd); 

MyResult ret = new MyResult(); 

ret.str = rd; 

return ret; 

7.3.2. Simple Inter-Task Communication and Spawning new Task 

from a Task 

This is a very simple example that just generates a task and from inside that task a new task of 

a different kind is generated. Than the first tasks block waiting for an object from the second 

one. When the second one executes, it sends an object of the requested kind. The files for this 

application are presented below. 

MYRESULTCOLLECTOR.JAVA 

import alice.result.¤ ; 

public class MyResultCollector extends ResultCollector { 


96

); 

} 

} 

MYRESULT.JAVA 



MyResult res1, res2; 

while (getResultsNoReady() ¥ 

2) ¦ 

; 

res1 = (MyResult)collectResult(); 

res2 = (MyResult)collectResult(); 

System.out.println("Number returned:"+ (res1.i ¥ 

-1) ? res1.i : res2.i 

¦ 

public class MyResult implements Serializable { 

} 

public int i; 

public MyResult() { 

} 

i = -1; 

MYTASKGENERATOR.JAVA 


public class MyTaskGenerator extends TaskGenerator { 

} 

public MyTaskGenerator() { 

} 

public void generateTasks() { 

} 

System.out.println("{MyTaskGenerator}: generating TASK"); 

Task t1= new MyTask1(); 

process(t1); 


} 

this.generateTasks(); 

97

MYTASK1.JAVA 



public class MyTask1 extends Task { 

} 

public MyTask1 () { 

} 


} 

MYTASK2.JAVA 


System.out.println("{Task}: Executing task1 "); 

Task t = new MyTask2(); 

process(t); 

Dummy d = (Dummy)requestObject("dummy_id"); 

MyResult res = new MyResult(); 

res.i = d.i; 

return res; 



public class MyTask2 extends Task { 

} 

public MyTask2 () { 

} 


} 

System.out.println("{Task}: Executing task2 "); 

Dummy d = new Dummy(); 

d.i = 69; 

sendObject(d,"dummy_if"); 

MyResult ret = new MyResult(); 

return ret; 

98

DUMMY.JAVA 


public class Dummy implements Serializable { 

} 

public int i; 

public Dummy() { 

} 


99

8. ALiCE Programming Templates 

Important guidelines 

£ Do NOT use circular references 

£ Do NOT use static methods for the task generator, the task or the results 

8.1. The Task Generator Template 

/§¨§ 

ALiCE Task Generator Template 

§ 

§¨§ / 

import alice.consumer.§ ; 

import alice.data.§ ; 

public class TASKGEN_CLASSNAME extends TaskGenerator { 

/§¨§ 

The no parameters constructor is a must 

§ 

§¨§ / 

public TASKGEN_CLASSNAME() {} 

public void init() { 

} 

//Place your initialisation code here 

/§¨§ 

main method - DO NOT make it static; 

§ 

§ - this is the entry point and the point 

100


where tasks should be generated 

§ 

/ §¨§ 


/§ 

This is where the tasks are generated, usually 

§ 

in a loop § 

§ / 

//To send a task for producing (this should be called for 

// each task) : 

TASK_CLASSNAME t = new TASK_CLASSNAME(); 

process(t); 

//To open a data file, read and write from/to it: 

DataFile f = Data.openFile("file_name",this); 

READ_BUFF = f.read(POSITION, LENGTH); 

f.write( WRITE_BUFF, POSITION, LENGTH); 

//To send/receive an object 

OBJECT_CLASSNAME obj = new OBJECT_CLASSNAME(); 

sendObject(obj, "snd_str_id"); 

OBJECT_CLASSNAME rcvObj = 

(OBJECT_CLASSNAME)requestObject("rcv_str_id"); 

} 

} //end class 

//To receive a string message from the result collector: 

String msg = getStringMessage(); 

8.2. The Result Collector Template 

/§¨§ 

ALiCE Result Collector Template 

§ 

§¨§ / 

import alice.result.§ ; 

101


public class RESCOL_CLASSNAME extends ResultCollector { 

} 

//Place Variables Here 

//Conscructor 

public RESCOL_CLASSNAME() { 

} 


} 

/§ 

Place here result collection and processing code 

§ 

/ § 

//to obtain number of results ready call: 

int resReady = getResultsNoReady() 

//to get a new result call: 

RES_CLASSNAME res = (RES_CLASSNAME)collectResult(); 

8.3. The Task Template 

/§¨§ 

ALiCE Task Generator Template 

§ 

§¨§ / 

import alice.consumer.§ ; 

import java.io.§ ; 

public class TASK_CLASSNAME extends Task { 

//Place variables here 

//Constructor 

public TASK_CLASSNAME () { 

} 


102

§ 


This is where you do your calculations 

§ 

The results can be any kind of objects 

§ 

§ / 

// You can generate and send a new task to 

// be produced: 

O_TASK_CLASSNAME t = new O_TASK_CLASSNAME(); 

process(t); 

//To open a data file, read and write from/to it: 

DataFile f = Data.openFile("file_name",this); 

READ_BUFF = f.read(POSITION, LENGTH); 

f.write( WRITE_BUFF, POSITION, LENGTH); 

//To send/receive an object 

OBJECT_CLASSNAME obj = new OBJECT_CLASSNAME(); 

sendObject(obj, "snd_str_id"); 

OBJECT_CLASSNAME rcvObj = 

(OBJECT_CLASSNAME)requestObject("rcv_str_id"); 

} 

} 

103

9. Conclusions 

9.1. Summary 

In summary, this project achieved the following: 

Design and implementation of a working grid computing system based on Sun Microsyt- 

£ 

stems’ JavaSpaces Technology. To allow better control of the system, we have implemented 

a three-tier architecture system consisting of consumers, producers, and resource brokers. 

We have designed and implemented a scalable, modular, portable and performant system 

based on a library used to transfer any live Java objects over the network, also developed by 

us. The system is also supporting other programming languages for application develope- 

ment, like C and C++; 

Design a programming template for users can use to develop grid applications. Our TaskGenerator- 

£ 

Tasks-ResultCollector programming model allows users to decouple visualization from com- 

putation. We have also implemented a system to generate tasks from other tasks, to be able 

to implement pear-to-pear algorithms, as well as the master-and-slave ones. We also devel- 

oped a very powerfull communication system between components of an application, based 

on requesting/sending generic Java objects; 

Design a data server for a fiable and performant distribution of data needed by applications 

£ 

running on our system. 

9.2. Future work 

There are still many things that should be implemented in ALiCE to make it the powerfull grid 

computing system it can be and aims to become. This includes: 

£ 

Integrating new scheduling tehniques and load-balancing scheduling; 

104

£ 

£ 

9. Conclusions 

Implementing a performant fault-tolerant architecture for the producers, including maybe 

task migration and pre-emption, together with check-pointing; 

Implementing Quality of Sevice tehniques to prioritize critical applications; 

£ Developping a centralized accounting and monitoring scheme. 

105

Bibliography 

[1] Johan Prawira (2002). ALiCE, Java-based Grid Computing System, Honours Thesis, School 

of Computing, National University of Singapore 

[2] Lee, Matsuoka, Talia, Sossman, Karonis, Allen and Thomas (2001). A Grid Programming 

Primer, Programming Models Working Group, Grid Forum 1, Amsterdam 

[3] Baratloo, Karaul, Kedem, Wyckoff (1996). Charlotte: Metacomputing on the Web. In the Pro- 

ceedings of the 9th International Conference on Parallel and Distributed Computing Systems, 

1996. 

[4] Foster and Kesselman (1997). Globus: A Metacomputing Infrastructure Toolkit. International 

Journal of Supercomputing Applications. 

[5] Foster, Kesselman and Tuecke (2001). The Anatomy of the Grid: Enabling Scalable Virtual 

Organizations, International Supercomputer Applications 2001. 

[6] Germain, N.ri, Fedak and Cappello (2000). XtremWeb: building an experimental platform 

for Global Computing. Laboratoire de Recherche en Informatique, Universit. Paris Sud. 

[7] Hornburb, P.C. (2001). The Architecture of a World Wide Distributed System. Ph.D. thesis. 

Vrije University, Netherlands. 

[8] Khunboa, C. and R. Simon (2001). On the Performance of Coordination Spaces for Dis- 

tributed Agent Systems. In the Proceedings of the IEEE 34th Annual Simulation Symposium, 

April, 2001, Seattle, Washington. pp 7-14. 

[9] Lee, C.R (2000). The Design and Implementation of a Computing Engine in ALiCE, Honours 

thesis. School of Computing, National University of Singapore. 

106

Bibliography 

[10] Sarmenta, L.F.G. (1998). Bayanihan: Web-Based Volunteer Computing Using Java. In the 

Proceedings of the 2nd International Conference on World-Wide-Computing and its Appli- 

cations (WWCA’98), Tsukuba, Japan, March 3-4, 1998. Lecture Notes in Computer Science 

1368, Springer-Verlag, 1998. pp. 444-461 

[11] Sarmenta, L.F.G. (2001). Volunteer Computing. Ph.D. thesis. Department of Electrical En- 

ginnering and Computer Science, MIT, March 2001. 

[12] SETI@home: http://setiathome.ssl.berkeley.edu 

[13] Distributed.Net: http://www.distributed.net 

[14] Globus: http://www.globus.org 

[15] The GLOBE Project: http://www.cs.vu.nl/~steen/globe/ 

[16] Legion: http://www.cs.virginia.edu/~legion 

[17] Condor: http://www.cs.wisc.edu/condor 

[18] IEEE High Performance Distributed Computing (HPDC) symposium 2001: http://www- 

2.cs.cmu.edu/~hpdc 

[19] High Performance Computing Symposium 2002: http://wwwteo.informatik.uni- 

rostock.de/HPC 

[20] 3rd International Workshop on Grid Computing: http://www.gridcomputing.org/grid2002 

107

a Grid Computing System - Utopia

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?