For the formal specification of PMI-2.
See PMI v2 API for some discussions about possible designs and some issues.
(WDG - I find it valuable to list objectives and requirements first. Here's an initial list. They can and should be expanded, and the consequences of each understood)
- Scalable - Semantics of operations must permit scalable implementation
- Efficient - Must provide MPI implementation with the information that it needs without requiring potentially expensive steps.
- Complete - Must support all of MPI, including dynamic processes
- Robust - Must handle failures and aborts, including any resources acquired by the MPI application.
- Correct - Must avoid race conditions in the design
- Portable - Must not assume a particular environment such as POSIX
The basic idea is that all interaction with external process and resource managers, as well as the exchange of any information required to contact other processes in the same parallel job, takes place through the process management interface or PMI.
There are four separate sets of functionality:
- Creating, connecting with, and exiting parallel jobs
- Accessing information about the parallel job or the node on which a process is running
- Exchanging information used to connect processes together
- Exchanging information related to the MPI Name publishing interface
While these can be combined within a single, full-featured process manager, in many cases, each set of services may be provided by a different actor. For example, creating processes may be managed by a system such as PBS or LoadLeveler. The Name publishing service may be accomplished by reading and writing files in a shared directory. Information about the parallel job and the node may be provided by mpiexec, and the connection information may be handled with a scalable, distributed tuple-space system.
There are three groupings of processes that are important in understanding the process manager interface.
- An MPI process; this is usually an OS process (but need not be; an example would be threads in a language that keep named globals thread-private by default).
- This is a collection of processes managed together by a process manager that understands parallel applications. A job contains all of the processes in a single MPI_COMM_WORLD and no more. That is, two processes are in the same job if and only if they are in the same MPI_COMM_WORLD
- Connected Jobs
- This is a collection of jobs that have established a connection through the use of PMI_Job_Spawn or PMI_Job_Connect. If any process in a job establishes a connection with any process in another job, then all processes in both jobs are connected. That is, connections are established between jobs, not processes. This is necessary to implement the MPI notion of connected processes.
In addition, it is desirable to allow the PMI client interface to be implemented with a dynamically loadable library. This allows an executable to load a version of PMI that is compatible with whatever process management system will be running the application, without requiring the process management systems to implement the same communication (or wire) protocol. The consequence of this is that the pmi.h header file is standardized across all PMI client implementations (in PMI v1, each PMI client implementation could, like MPI, define its own header file).
The PMI interface represents most data as printable characters rather than as raw binary. This simplifies support for systems with heterogeneous data representations and also simplifies the "wire" protocol. The character set for PMI v2 is UTF-8; this is a variable-length representation that contains ASCII as a subset and for which the null byte is always a string terminator. All character data in the PMI v2 interface is in the UTF-8 character set. The rationale for using UTF-8 over ASCII is to avoid problems with internationalization in the case where commands return user-defined error strings.