Work Packages
The Project has been divided into five workpackages, each corresponding with a general field withing the Project. Most of the participant groups are involved in the development of all the packages. The group responsible for the package must coordinate the actions of the groups. Each workpackage is indeed divided into tasks, wich are the minimal work unit. A task is assigned to a group, although several groups can collaborate in the consecution of the aims of a particular task.
- Infrastructure deployment
- Middleware: low-level scheduling and fault tolerance
- Middleware: Grid tools, programming environment, monitoring and performance analysis
- Web portal
- Applications
Infrastructure deployment
The main goal of this workpackage is the deployment of a Grid infrastructure, stable enough to allow the test of the applications and middleware software designed within the Project. The election of the middleware has been made under considerations of disponibility, support, stability and compatibility with other future or ongoing projects.
This package also includes the start up of a Certification Authority for authentication purposes.
Moreover, the infrastructure includes support for IPv6 by means of tunneling. The research groups of National University of Asunción, Paraguay, and Complutense University of Madrid, Spain, are connected and performing some test on multicast over IPv6 for resource discovering.
Activities in this package (UPV group is the coordinator of this package)
A Certification Authority has been established based on OpenCA integrated by two uncoupled components: a Registration Authority to admit, verify and check certificate applications, and a Certification Authority to issue certifications. The installation is a Trustix 2.2 distribution with software to support OpenRA and OpenCA.
A Live-CD (alias, Knobus) has been created to simplify the deployment of Globus Toolkit 4 in new sites. It is a Linux Knoopix specifically tailored to deploy basic Globus services, GRAM, GridFTP and MDS in three steps:
- Application for a machine certification obtained from our Certification Authority
- Preparation of a Configuration diskette
- Booting from Knobus together with the configuration diskette
Middleware: low-level scheduling and fault tolerance
Job scheduling is critical for an optimal use of the Grid resources. It is intended the direct use of the resources by the applications, minimizing the middleware layer needed for the coordination of the different tasks. In this way, more flexibility and middleware independence can be achived for designing scheduling strategies.
Secondly, some efficient solutions must be developed in order to provide fault tolerance to the Grid applications. Two aproaches are considered, depending on the kind of parallel application. For strongly coupled intra-cluster applications, the checkpointing approach is the most efficient solution. The aim is to allow the resume of the execution of the faulty application in another node of the Grid by means of application-level portable checkpointing. Lack of portability is a drawback on most of the existent solutions, although it is a very desirable property in a Grid environment. On the other hand, for loosely-coupled inter-cluster applications, such as the master-worker paradigm, a data replication approach would be more convenient. The goal is to detect and recover the system consistency, granting the completion of the job.
Task 1. "Low-level" Task scheduling in a Grid or multicluster (UAB group is the coordinator of this task)
The goal of this task is to improve the speedup of Master-Worker parallel applications, initially coded for only one cluster, when they are run in a multi-cluster environment, through a user transparent adaptation that achieves a given efficiency grade on the use of computational resources.
To reach these goals, this work has been splitted into three levels:
An architecture to arrange multi-cluster resources so that the execution of applications will be scalable, robust, efficient and adaptable. Scalability is a strong must to allow running the application in a multi-cluster with any number of clusters. Robustness is demanded to adapt the application to changes in Internet parameters such as bandwidth, latency, transient faults, etc.
A communication ``middleware'' so a set of clusters can be considered a Multi-cluster, providing reliable communication and being able to make use of the Internet available bandwidth.
A methodology to guide the running process in order to improve the application speedup in the multi-cluster while keeping a given efficiency grade. This methodology must be capable to asses the anticipated execution time and efficiency for the application in the multi-cluster.
The methodology includes an analytical model which, taking into account cluster, interconnection network and application parameters, and applying a computation-communication analysis. It manages to work out execution possibilities in a multi-cluster environment, chooses the most convenient resources and predicts running time and efficiency grades
Task 2. Application-level portable Checkpointing (UDC group is the coordinator of this task)
CPPC (Controller/Precompiler for Portable Checkpointing) is a checkpointing tool focused on the insertion of fault tolerance into long-running message-passing applications. It is designed to allow for execution restart on different architectures and/or operating systems, also supporting checkpointing over heterogeneous systems, such as the Grid. It uses portable code and protocols, and generates portable checkpoint files while avoiding traditional solutions which add an unscalable overhead (such as process coordination or message-logging).
CPPC is made up of a library and a compiler. The library contains routines for variable-level checkpointing. The compiler translates a parallel code annotated with user directives into fault tolerant code with CPPC library calls. It also includes compile-time analysis to automatize the insertion of these directives.
In order to introduce fault-tolerance through CPPC, the code of the application needs to be changed so that it communicates with the runtime library, passing information about variables that need to be dumped in the next checkpoint, where to create the state files, etc. Also, flow-control structures are placed to control the re-execution of certain critical portions of code at restart. This will enable the recovery of certain non-portable parts of data, such as MPI communicators or open files, that cannot be just stored as binary data in a state file.
As the insertion of these function calls and flow-control structures would mean a significant lack of transparency, and hence a more difficult integration, the CPPC runtime library is distributed along with a precompiler that helps the user by translating simple directives into useful code, but also performing analyses to automatically obtain information and therefore relieving the user of time-consuming and error-prone tasks, such as deciding which memory regions need to be dumped upon reaching a checkpoint. Some of the transforms implemented by the CPPC compiler are semantic-directed. These require certain semantic information about which function calls implement certain semantics and how. Since source code in imperative languages is not intrinsically semantic, it is necessary to provide semantic information to the compiler. The mechanism for supplying this information is extensible and not tied to specific implementations of a semantic, fulfiling the portability target present in CPPC.
Task 3. User transparent fault-tolerance based on rollback-recovery protocols (RADIC) (UAB group is the coordinator of this task)
A RADIC (Redundant Array of Independent Checkpoints) architecture is proposed, based on the pessimistic checkpoint restart and log message strategy which by means of a set of protection processes, distributed on every cluster node, takes on responsibility to monitor node faults, periodically saving sound states of the parallel application and, in the event of a failure, rolls back to a former sound state and restart the execution from this time on.
The set of protection processes is subdivided into two groups. The first group is in charge of monitoring the system to detect node faults and implements the protection and fault recovery strategy. The second group is in charge of communicating and monitoring communications among applications processes, and also saving vital information to assist fault recovery.
These two process groups are called Observers and Protectors, respectively and are distributed all over the cluster.
Task 4. Fault-tolerance based on data replication (UAB group is the coordinator of this task)
The goal is to execute a whole work correctly, in a Grid or Multi-cluster system, even when some system elements fail, losing as little already done work as possible (minimum overhead) considering that performance is reduced because of this overhead introduced to tolerate faults and because of losses related to system node faults.
Fault tolerance in a system is achieved by redundancy techniques. Considering that physical redundancy of computing nodes is intrinsic in clusters and Grid/Multi-clusters systems, we are interested in taking advantage of this kind of redundancy in order to, user-transparently, build a functional redundancy based on the Data Replication paradigm. The proposed FTDR (Fault Tolerant Data Replication) algorithm handles the computing and communication resources pool that makes up a multi-cluster system.
An algorithm of Data Replication must mainly solve the problem of replica coherence preservation. As a number of processes are being executed at the same time, the data changes produced must be applied to every replica. The aim of the FTDR system is to assure that there is functional redundancy enough so that the work can be completed even in case of fault occurrences, to detect and diagnose faults affecting to any of the functional elements in the system and to tolerate those faults by reconfiguring the system and recovering the consistency in such a way that the work is correctly completed.
The proposed architecture follows a hierarchical Master/Worker model. Fault tolerance devices are implemented on every cluster, preventing as far as possible Data Replication through Internet.
FTDR implementation will be accomplished by a Middleware responsible of computing protection, the same scheme applied to every cluster. Fault-tolerance will be user transparent.
Middleware: Grid tools, programming environment, monitoring and performance analysis
In order to facilitate the development of Grid applications and to increase their performance, some help tools must be developed. In particular, some tools are already proposed, regarding with application programming environment, monitoring tools and performance analysis.
The programming environment should allow a programmer to develop applications to be executed in the Grid into a cluster or into a number of geographically distributed clusters. It should offer the possibility to hide the communication details.
The monitoring tools will allow to obtain information about the actual behaviour of the different kind of applications, i.e., parametric sequential, parallel intra-cluster and parallel inter-cluster applications.
Also related with this item are the performance analysis tools, wich will allow to identify the bottlenecks in the applications and saturation points in the system.
Task 1. Dynamic Monitoring of Grid Applications (UAB group is the coordinator of this task)
In order to help users to analyze application behavior, an architecture infrastructure is proposed in this work for dynamic monitoring of parallel Grid enabled applications. This monitoring architecture is inspired on MATE components and should be used as a first step toward dynamically automatic tuning environment for Grid applications.
In this work two main approaches are proposed to solve process tracking problem: a system service approach and binary packaging approach. In system service approach the monitoring tool is exposed as a Grid service and it uses Grid information services to establish the communication between the monitoring processes and the client of those processes. The idea is to have the monitoring service enabled on the machines that may be used for execution of application processes. That allows us to expose the monitoring service as a resource within Grid services. In binary packaging approach, Single Program Multiple Data (SPMD) application submission is supported by composing a new application program binary containing all the components necessary to execution and monitoring process of this program. A common "mpirun" submission is an example of SPMD submission. The binary can be delivered to specific resources based on configuration. Each resource may be composed by many machines selected by a local batch scheduler.
Grid systems typically interconnect geographically distributed resources. One major characteristic is that communications among long distance separated resources use shared communication networks. The Internet is frequently used as interconnection backbone. The collateral effect in instrumentation is that the data transition should be minimized because the network may limit the efficiency of the system. To overcome this problem, the application sensor concept is introduced in order to reduce normal event trace generation. Sensors are in-process inserted components that maintain state. Those sensors can be associated to different program execution points and, when the program passes through those points the instrumentation inserted code activates the sensors. Differently from traditional event tracing where the trace is generated on all instrumented execution points, a function timer sensor type is proposed, which reduces the event generation by using a timer sensor installed in the beginning and ending execution points of the function selected to be instrumented. On the beginning point the sensor records the current timestamp and function parameters. Moreover, on the ending point the sensor calculates the time spent for the function and generates the trace data.
In order to correlate trace events from different application processes running on machines distributed over the Grid, all monitoring processes need to have the same time reference information. An alternative infrastructure based on current available worldwide time services which can supply the synchronization needs for event trace ordering has been studied.
Web portal
The Web portal is designed to be the entry point to the Grid services, including authentication, obtention of certificates, access to computational resources and documentation, and launching of the applications.
It is important to spread the results of the Project so that they can be assimilated and used by the scientific community.
Applications
The applications that have been chosen as a testbench are of high interest for the Ibero American region.
The first group of applications are related to Proteomics and Healthcare. This group includes 3-Dimensional Electronic Microscopy and Tomography (3DEMT), and telediagnosis, medical images processing, biomedic processes simulations, data mining, and so on.
GENECODIS is a web-based tool for the ontological analysis of large lists of genes. It can be used to determine biological annotations or combinations of annotations that are significantly associated to a list of genes under study with respect to a reference list. As well as single annotations, this tool allows users to simultaneously evaluate annotations from different sources, for example Biological Process and Cellular Component categories of Gene Ontology.
The second group of applications are related to environmental management, such as wildfire simulations and floods prevention. The parallel computation potential of the Grid can be used to improve the prediction reliability which will then allow to design more effective strategies in the fight against the fire. Another aim is to design an application for prediction and control of floods based on satellite and hyperspectral images.