Difference between revisions of "Debugger Message Queue Access"

From Mpich
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 +
[[Category:Design_Documents]]
 +
 
<!-- ported from http://www-unix.mcs.anl.gov/mpi/mpichold/developer/design/dbgqs.htm -->
 
<!-- ported from http://www-unix.mcs.anl.gov/mpi/mpichold/developer/design/dbgqs.htm -->
 
__NOTOC__
 
__NOTOC__
Line 7: Line 9:
 
The structure <code>MPIR_Comm_list</code> contains a pointer to the head of the list, along with a sequence number. The sequence number is incremented any time a change is made to the list; this allows the debugger to detect when it needs to rebuild its internal representation of the active communicators.
 
The structure <code>MPIR_Comm_list</code> contains a pointer to the head of the list, along with a sequence number. The sequence number is incremented any time a change is made to the list; this allows the debugger to detect when it needs to rebuild its internal representation of the active communicators.
  
The list of all communicators is maintained by two routines in <code>src/mpi/comm/commutil.c</code>: <code>MPIR_Comm_create</code> (creates a new communicator) and <code>MPIR_Comm_release</code> (frees a communicator if the reference count is zero). The structure that contains this list is defined in this file and the variable name is <code>MPIR_All_communicators</code> (the same as in MPICH1). This is a global variable to allow the debugger to easily find it.
+
The list of all communicators is maintained by two routines in <code>src/mpi/comm/commutil.c</code>: <code>MPIR_Comm_create</code> (creates a new communicator) and <code>MPIR_Comm_release</code> (frees a communicator if the reference count is zero). The structure that contains this list is defined in this file and the variable name is <code>MPIR_All_communicators</code>. This is a global variable to allow the debugger to easily find it.
  
 
These routines in turn call two macros,
 
These routines in turn call two macros,
Line 25: Line 27:
 
== Send Queues ==  
 
== Send Queues ==  
  
The definition of the "send queue" is somewhat vague. One definition is that it is the queue of user-created <code>MPI_Request</code>s that have not been completed with an MPI completion call such as <code>MPI_Wait</code>. This is the definition that was used in MPICH1. Other definitions could include any pending send operations in the communication layer; this could include pending blocking sends in a multi-threaded application.
+
The definition of the "send queue" is somewhat vague. One definition is that it is the queue of user-created <code>MPI_Request</code>s that have not been completed with an MPI completion call such as <code>MPI_Wait</code>. Other definitions could include any pending send operations in the communication layer; this could include pending blocking sends in a multi-threaded application.
  
In the initial version, MPICH2 implements the same send queue as MPICH1. This is done in a similar way, be adding two functions (really macros, so that they can be easily made into no-ops when debugger support isn't included) that add and remove user send requests from a separate list of requests.
+
This is done by adding two functions (really macros, so that they can be easily made into no-ops when debugger support isn't included) that add and remove user send requests from a separate list of requests.
  
 
The strategy is to add send requests to the special send queue when they are created within the nonblocking MPI send routines (e.g., <code>MPI_Isend</code>). Requests are removed when the <code>MPIR_Request_complete</code> routine is called (this routine is in <code>src/mpi/pt2pt/mpir_request.c</code>). The list of send requests is maintained by the routines <code>MPIR_Sendq_remember</code> and <code>MPIR_Sendq_forget</code> which are defined in <code>src/mpi/debugger/dbginit.c</code>.
 
The strategy is to add send requests to the special send queue when they are created within the nonblocking MPI send routines (e.g., <code>MPI_Isend</code>). Requests are removed when the <code>MPIR_Request_complete</code> routine is called (this routine is in <code>src/mpi/pt2pt/mpir_request.c</code>). The list of send requests is maintained by the routines <code>MPIR_Sendq_remember</code> and <code>MPIR_Sendq_forget</code> which are defined in <code>src/mpi/debugger/dbginit.c</code>.

Latest revision as of 16:25, 10 November 2012


Communicators

The model that the debugger interface uses is organized around (virtual) separate message queues for each communicator. The debugger interface requires a list of active communicators. Since communicator construction is a relatively heavyweight operation, this list is maintained whether or not the debugger support is enabled.

The structure MPIR_Comm_list contains a pointer to the head of the list, along with a sequence number. The sequence number is incremented any time a change is made to the list; this allows the debugger to detect when it needs to rebuild its internal representation of the active communicators.

The list of all communicators is maintained by two routines in src/mpi/comm/commutil.c: MPIR_Comm_create (creates a new communicator) and MPIR_Comm_release (frees a communicator if the reference count is zero). The structure that contains this list is defined in this file and the variable name is MPIR_All_communicators. This is a global variable to allow the debugger to easily find it.

These routines in turn call two macros,

MPIR_COMML_FORGET
MPIR_COMML_REMEMBER

that are defined in mpiimpl.h to call routines whose implementations are in src/mpi/debugger/dbg_init.c. These routines are

void MPIR_CommL_remember( MPID_Comm * );
void MPIR_CommL_forget( MPID_Comm * );

Receive Queues

The receive queues are part of the device implementation; in the CH3 device, they are implemented in the file src/mpid/ch3/src/ch3u_recvq.c. Normally, this file does not export the queues directly (the head and tail pointers are static). To support message queue debugging, the variables MPID_Recvq_posted_head_ptr and MPID_Recvq_unexpected_head_ptr are exported; these are pointers to the variables that hold the heads of those two lists.

Send Queues

The definition of the "send queue" is somewhat vague. One definition is that it is the queue of user-created MPI_Requests that have not been completed with an MPI completion call such as MPI_Wait. Other definitions could include any pending send operations in the communication layer; this could include pending blocking sends in a multi-threaded application.

This is done by adding two functions (really macros, so that they can be easily made into no-ops when debugger support isn't included) that add and remove user send requests from a separate list of requests.

The strategy is to add send requests to the special send queue when they are created within the nonblocking MPI send routines (e.g., MPI_Isend). Requests are removed when the MPIR_Request_complete routine is called (this routine is in src/mpi/pt2pt/mpir_request.c). The list of send requests is maintained by the routines MPIR_Sendq_remember and MPIR_Sendq_forget which are defined in src/mpi/debugger/dbginit.c.

Testing

To both test the interface and to provide a example of the sorts of operations performed by a debugger that makes use of this interface, the file src/mpi/debugger/tvtest.c (along with the support file src/mpi/debugger/dbgstub.c) provides a simple example of using these routines to access and display the message queues.