Ensemble

Ensemble Overview

Ensemble: Recent Research Activities (from 1995)

We have proposed the Ensemble methodology in which message passing (MP) applications are designed and built by composing modular MP components. We have developed tools for a) designing and implementing MP components, b) specifying composition directives and finally c) composing applications.

The Ensemble tools have been developed on top of the most popular MP APIs, PVM and MPICH (implementation of MPI) and also on proprietary environments, such as Parix. The composed MP applications are pure PVM or MPI programs, relying only on the APIs themselves and do not use any external environment for process communication. Consequently, Ensemble does not modify the capabilities of MP APIs and does not interfere with application behavior.

Ensemble reduces the Software Engineering Costs (SEC) incurring in composing application configurations by maintaining a single code for each of the components involved and reusing the components in different application configurations. Applications may either be SPMD or MPMD and either regular or irregular. Components are developed separately as independent MP programs specifying local and global communication abstractly (like “formal communication parameters”). Modular processes spawned from any of the modular components may communicate (point-to-point or collectively) with other modular processes, not necessarily spawned from the same component. Applications are composed from modular processes specifying for each process its actual local and collective communications (like “actual communication parameters”). In Ensemble, irregular MPMD applications are naturally supported and regular SPMD applications are just a special case.

Other composition approaches aiming to simplify the composition of high performance applications have been proposed by defining extensions to existing languages or by defining composition frameworks using OO techniques.

We aim to be as close as possible to MPI both syntactically and semantically, so that we may maintain all its advantages (e.g. wide support, grid aware implementations), use existing tools (libraries, analysis, visualization, etc.) and reuse existing applications. MPI has become the de-facto standard for high performance applications and it seems that it will also be the standard for grid enabled applications and environments.

Ensemble on Different Message Passing Environments

Process topologies in message passing applications are implicitly specified, but explicitly programmed within components. Topologies are formed by considering processes as nodes and communications channels between processes as arcs. Communication channels, i.e. the passing of messages from one process to another, in point-to-point communication are expressed by symmetric calls of send and receive operations in two processes. Each of the send or receive operations in one process has as parameter the identifier of the other process (e.g. tid in PVM and rank in MPI). Other parameters, which are common in both sides, include an integer value tagging the message and an implicit or explicit context. Process topologies are implicitly specified by such pairs of symmetric calls, but must be explicitly programmed by manipulating process identifiers in the calls. If the topology is regular, e.g. a grid or a ring, the designer develops functions, which take a process identifier and return the identifiers of its communicating processes. These functions are usually parameterised to return the identifiers processes in any size of the regular topology.

There are three problems with this approach. The first is that the implementation effort does not only depend on the application design, but also on the target Message Passing Environment (MPE), such as PVM, MPI or Parix. Each MPE requires different implementation techniques for programming the same design, because of the process management models they assume. Each MPE is suited for specific types of process topologies, those being closer to its process management model. For example, PVM favours tree and MPI ring or grid topologies. Topologies not well suited to an MPE may certainly be implemented, but require more complex programming. Additional effort is required for parameterizing topologies not well suited to a particular MPE.

The second problem is that this approach is only suitable for regular topologies. There are many topologies, which are not globally regular but only partially or locally regular or even altogether irregular. In such cases, general functions returning the identifiers of the communicating processes cannot be derived and consequently ad hoc programming methods are used.

The third problem is that programming communication channels based on symmetric calls of send and receive operations which directly use specific process identifiers limit the reuse of components, as the processes spawned from them may operate only within specific topologies.

In Ensemble the design of reusable message passing components is independent of the target MPE and may be used in any topology, whether regular, partially regular or irregular.

Recently we have focused our research on MPI, being the de-facto standard of MP programming.

Porting MP applications between MPE

The differences between message passing environments (MPI, PVM, Parix) lead to different implementations of the same application design. We have shown the mechanical porting of applications developed under Ensemble from one environment to the other with a minimum and predictable effort.

Ensemble and Problem solving Environments

We have used Ensemble as a mechanism for composing message passing applications in a meta-computing context on demand. Ensemble is particularly effective in the case where users demand different process topologies to be created out of the same components. We demonstrate this case by an application from transaction processing and in particular parallel query execution based on the tree pipelining model, where all queries (“applications”) are composed out of the same relational algebra and set operators and users do not need to know anything about their execution; as far as they are concerned they submit queries and receive the results. All execution aspects are transparent.

Ensemble and Grids

Grids impose new requirements concerning program modularity, since applications (possibly regular SPMD) may need to be coupled with other applications developed by different teams. An application may still be required to run independently, possibly as an SPMD (e.g. atmospheric model), or to be coupled with other applications (e.g. ocean model) running together as MPMD (e.g. climate model). Even regular SPMD applications need substantial code modification to be coupled with other applications. Usually, different code modifications are necessary for different application configurations. For example, the climate model may be used to model the global earth climate or the more local El-Ninio phenomenon (different geography). We may also need to couple only the atmospheric and ocean model and later add land and hydrology models. Code modifications required in each case make single code maintenance of individual applications (e.g. atmospheric, ocean) a difficult task.

In the Ensemble methodology message passing components are developed separately; applications, whether regular, irregular, SPMD or MPMD, are composed using an external composition language without component modification. Composed applications are pure MPI programs running on MPICH-G2, the grid enabled implementation of MPI.

Formal Support

Components must be compatible for correct program behavior. The semantics of virtual interfaces should be known before composition. Components cannot be used knowing only their virtual envelop. Compatibility in general is a dynamic property and is not restricted to the static compatibility of channel binding, message types, etc. We are currently developing an environment to test component compatibility and debug composed applications based on formal component specifications, which are used in synergy with program execution.

We have defined descriptions of Colored Petri-Nets (CPNs) with scalable interfaces, called template CPNs, to specify the behaviour of scalable reusable program components. From the template CPNs we generate composable CPNs, which are pure CPN descriptions. We have used the PN composition technique of Kindler adapted to the composition of Ensemble applications as described by the script. During composition static information as specified by the script is validated (for example, the number of communication ports within the range and the compatibility of port interconnections). The composition is directed by the script. The correspondence of program and specification composition is depicted in the following Figure. In the middle there is the application script, which is used to compose applications (left hand side). The script is also used by the specification composer to compose application specifications (right hand side).

We have developed tools for designing template CPNs, generating from them composable CPNs and composing them according to Ensemble scripts. Our effort does not simply aim to support the Ensemble methodology by a formal specification tool. We use Ensemble and its associated tools as a viable means of bridging the gap between the disjoint worlds of specifications and program executions. Usually specifications are obtained before program design and program implementation. But this view is not valid in the software composition approach. Programs and their specification are composed together. Especially in Ensemble each may be independently produced from the script. In a sense the composed specifications are the semantics of composed programs under the assumption that the component specifications are correct.

To alleviate possible discrepancies between component specifications and component implementations we use one to test the other. On the one hand, tracing information of the composed application is passed to a simulator of the composed specifications. Thus, the behaviour of the application is not only monitored as it is running, but also actually tested. The programmer is not obliged to inspect detailed and confusing charts, visualisation of executions, as the simulation system checks against the specifications either automatically in the background or by analysing a trace file for erroneous behaviour. The use of tracing information in synergy with specification simulation information should always be used during individual component development. On the other hand, the specification simulator may be used as an advanced breakpoint mechanism, which controls the execution of the actual program. Specifications and programs are not in disjoint worlds any more, but are inter-related.

We believe that in this scheme the extra effort of designing specifications of reusable components is justified as it assures reliability and reduction of production costs of message passing applications