CH4 Overall Design
Shortcomings of CH3
MPICH has relied on the CH3 device as the primary communication device all through the "MPICH2" series and a part of the "MPICH-3.x" release series. Unfortunately, over time, the device has accumulated a number of hacks to accommodate newer communication models and network architectures, much further than what it was originally designed to do. Some of the shortcomings of the CH3 design are listed here:
- VC model: CH3 relies on communication in the context of "virtual connections" (VCs), where each peer process has a VC associated with it. This architecture matched networks that relied on a connection-oriented protocol, where VCs were a convenient way to keep track of the connection state and other peer-related information. Over time, VCs have accumulated additional fields, not all of which are useful to the same degree. Some of these can be cleaned up to reduce the size of each VC. Also, there has been some effort to make the allocation of VCs more dynamic to only create VCs to the processes we are communicating with. However, none of these approaches solve the fundamental scalability limitation of the VC structures, which scale with the number of peer processes.
- Active-message based Communication Model: In the CH3 device, almost all communication is performed in the context of active messages. Each communication type has a "packet type" associated with it, and a packet handler associated with each packet type. Communication relies on attaching the packet handler for each packet (which is fundamentally noncontiguous since the packet header is at a separate location than the user data), and the receiver process invoking a software handler to process the message. In newer MPICH versions, CH3 was modified to support networks that natively support the MPI matching model required for send/recv communication, but it is a (rather clumsy) workaround from the current active-message model. Direct support for newer matching-based networks is not present.