Difference between revisions of "Communicators and Context IDs"

From Mpich
Jump to: navigation, search
(Problems and Gotchas)
Line 40: Line 40:
 
* The current code expects that <code>unsigned int</code> values are 32-bits or larger.  The comments imply that it needs ''exactly'' 32-bit <code>unsigned int</code>s but it looks like we lucked out and it should work with larger sizes as well.  This needs to be cleaned up in the current code.
 
* The current code expects that <code>unsigned int</code> values are 32-bits or larger.  The comments imply that it needs ''exactly'' 32-bit <code>unsigned int</code>s but it looks like we lucked out and it should work with larger sizes as well.  This needs to be cleaned up in the current code.
 
* 8192 is too many elements... in order to shift left two bits you end up overflowing the 16-bit value.  This should be cut in half to 4096.
 
* 8192 is too many elements... in order to shift left two bits you end up overflowing the 16-bit value.  This should be cut in half to 4096.
* IDs are allocated from the lowest available mask integer index but the highest available bit index within that integer.  This leads to a strange looking pattern when the mask is viewed as a hex string in a normal fashion because there will appear to be odd gaps except at <code>(size % 32 == 0)</code>.
+
* IDs are allocated from the lowest available mask integer index but the highest available bit index within that integer.  This leads to a nice looking pattern when the mask is viewed as a hex string via the <code>MPIR_ContextMaskToStr</code> function but a strange ordering of ID values (124, 120, 116, ..., 0, 252, 248, ..., 128, 380, etc).
* While new IDs are allocated in the fashion described just above, the three default communicators (<code>MPI_COMM_WORLD</code>, <code>MPI_COMM_SELF</code>, and <code>MPIR_ICOMM_WORLD</code>) take up bits 0-2 of word 0 (prefixes 0, 4, and 8).  In contrast, the first context ID allocated after <code>MPI_Init</code> will be bit 31 of word 0 (id prefix 124).  This works out OK, it's just surprising when you are debugging.  It wouldn't hurt to change this to something less surprising if we get the time.
+
* While new IDs are allocated in the fashion described just above, the three default communicators (<code>MPI_COMM_WORLD</code>, <code>MPI_COMM_SELF</code>, and <code>MPIR_ICOMM_WORLD</code>) take up bits 0-2 of word 0 (prefixes 0, 4, and 8).  In contrast, the first context ID allocated after <code>MPI_Init</code> will be bit 31 of word 0 (id prefix 124).  This works out OK, it's just surprising when you are debugging and get <code>"03fffff8ffffffff..."</code> when you print out the mask field.  It wouldn't hurt to change this to something less surprising if we get the time.
  
 
== Context Type Suffix ==
 
== Context Type Suffix ==

Revision as of 23:00, 21 August 2008

This page is intended to serve as an overview of how communicators and context IDs interact. It is still very incomplete.

To be covered:

  • logically what is a ctx id
  • structure of the ctx id value itself
  • structure of the context mask
  • API for allocating/freeing ctx ids
  • known issues, esp portability

What Is A Context ID?

When MPI receives a message and matches it against MPI_Recv requests, it compares the message's envelope to the MPI_Recv's envelope. The envelope is the triple of (source, tag, communicator). The source and tag are explicitly integers, yet the communicator is a logical construct indicating a particular communication context. In MPICH2 this context is implemented via an additional tag field known as the context id. It's worth remembering that there is no wild card matching for communicators.

The MPICH2 context ID is a 16-bit integer field that is structured as follows:

XXXXXXXXXYYYYYZZ

In this crude diagram each character represents a bit. There are three fields of the context ID indicated by letter and color:

Mask Word Index (X)
This is the index into the context ID mask (explained below).
Bit Index (Y)
This is which bit index within the mask word that this ID refers to.
Context Type Suffix (Z)
This is used to indicate different communication contexts within a communicator. For example, user point-to-point messages (MPI_Send/MPI_Recv) occur in a different context than collective messages (MPI_Bcast, etc). This also explained further below.

The actual type of a context ID is MPIR_Context_id_t, which is typedefed to int16_t.

FIXME XXX DJG I think that the context_id code is broken because it uses right-shifts of a potentially negative number to obtain the Mask Word Index. This will result in the wrong word about half the time when the top bit is set because high bits are set when shifting right. We should change to an uint16_t as the MPIR_Context_id_t.

Context ID Mask

The context ID mask is a bit vector that is used to keep track of which context IDs have been allocated. In the current code it is an array of MAX_CONTEXT_MASK (256) 32-bit unsigned ints for a total of 8192. Each process has its own mask and its state may vary from process to process depending on communicator membership patterns.

Mask Access And Multi-threading

Talk about critical sections, the local context mask, and lowestContextId. XXX DJG finish this section

Problems and Gotchas

There are several issues and things to watch out for when working on the context ID code in commutil.c.

  • The current code expects that unsigned int values are 32-bits or larger. The comments imply that it needs exactly 32-bit unsigned ints but it looks like we lucked out and it should work with larger sizes as well. This needs to be cleaned up in the current code.
  • 8192 is too many elements... in order to shift left two bits you end up overflowing the 16-bit value. This should be cut in half to 4096.
  • IDs are allocated from the lowest available mask integer index but the highest available bit index within that integer. This leads to a nice looking pattern when the mask is viewed as a hex string via the MPIR_ContextMaskToStr function but a strange ordering of ID values (124, 120, 116, ..., 0, 252, 248, ..., 128, 380, etc).
  • While new IDs are allocated in the fashion described just above, the three default communicators (MPI_COMM_WORLD, MPI_COMM_SELF, and MPIR_ICOMM_WORLD) take up bits 0-2 of word 0 (prefixes 0, 4, and 8). In contrast, the first context ID allocated after MPI_Init will be bit 31 of word 0 (id prefix 124). This works out OK, it's just surprising when you are debugging and get "03fffff8ffffffff..." when you print out the mask field. It wouldn't hurt to change this to something less surprising if we get the time.

Context Type Suffix

The last two bits of the ID are used to indicate different communication contexts within a communicator. Point-to-point and collective communication occur in separate contexts and use a different suffix to form different context IDs.

There are four possible values, what do they each mean?

XXX DJG finish this section

Context ID API

in src/mpi/comm/commutil.c:

static char MPIR_ContextMaskToStr(void)
static void MPIR_Init_contextid(void)
int MPIR_Get_contextid(MPID_Comm *comm_ptr, MPIR_Context_id_t *context_id)
int MPIR_Get_intercomm_contextid( MPID_Comm *comm_ptr, MPIR_Context_id_t *context_id, MPIR_Context_id_t *recvcontext_id)

XXX DJG finish this section

When and How Context IDs Are Selected For Communicators

Predefined Communicators

XXX DJG finish this section

Intracommunicators

XXX DJG finish this section

Intercommunicators

XXX DJG finish this section