Difference between revisions of "The Progress Engine"
Line 11: | Line 11: | ||
</pre> | </pre> | ||
+ | == Receive Progress == | ||
+ | == Send Progress == | ||
+ | == LMT Progress == | ||
== Nonblocking Collective Schedule List == | == Nonblocking Collective Schedule List == | ||
The Schedule List | The Schedule List |
Revision as of 15:53, 13 May 2014
The progress engine is a series of lists for MPI internal tasks such as: send, receive, collectives, and so on.
Contents
Algorithm Pseudo Code
do { /* make progress on receiving */ /* make progress on sending */ /* make progress on LMTs */ /* make progress on NBC schedules */ } while (is_blocking)
Receive Progress
Send Progress
LMT Progress
Nonblocking Collective Schedule List
The Schedule List
two-level linked list: the first level is a linked list of struct MPIDU_Sched
; and each struct MPIDU_Sched
is an array of struct MPIDU_Sched_entry
. Codes are located in mpid_sched.c
. The list is defined as a static variable all_schedules
in this file.
Data Structures & APIs
Following are the data structures used in the progress engine and the APIs to access the data structures.
The Schedule List
Functions that Directly Access all_schedules
MPIDU_Sched_are_pending
Read only. Check if all_schedules
is empty.
MPID_Sched_next_tag
Read only. Check the tags in all_schedule
.
MPID_Sched_start
Append an entry to the schedule list.
MPL_DL_APPEND(all_schedules.head, s);
MPIDU_Sched_progress_state
Process the schedule list, delete a schedule if all its entries have been processed.
/* process the list */ MPL_DL_FOREACH_SAFE(state->head, s, tmp) { for (i = s->idx; i < s->num_entries; ++i) { /* process entries */ } if (s->idx == s->num_entries) { MPL_DL_DELETE(state->head, s); } }
MPIDU_Sched_progress_state
is called by MPIDU_Sched_progress
The Schedule Object
MPID_Sched_create
Create a new schedule object.
MPIDU_Sched_add_entry
Add an entry to a schedule.
The Schedule Entries
Each schedule has an array of entries. The data structure of an entry is as follows:
struct MPIDU_Sched_entry { enum MPIDU_Sched_entry_type type; enum MPIDU_Sched_entry_status status; int is_barrier; union { struct MPIDU_Sched_send send; struct MPIDU_Sched_recv recv; struct MPIDU_Sched_reduce reduce; struct MPIDU_Sched_copy copy; /* nop entries have no args */ struct MPIDU_Sched_cb cb; } u; };
type
is used for handling different situations in the progress engine. Different types are called differently depend on which type they are (see #MPIDU_Sched_start_entry and #MPIDU_Sched_progress_state_2 ). The following is a list of entry types:
enum MPIDU_Sched_entry_type { MPIDU_SCHED_ENTRY_INVALID_LB = 0, MPIDU_SCHED_ENTRY_SEND, MPIDU_SCHED_ENTRY_RECV, MPIDU_SCHED_ENTRY_REDUCE, MPIDU_SCHED_ENTRY_COPY, MPIDU_SCHED_ENTRY_NOP, MPIDU_SCHED_ENTRY_CB, MPIDU_SCHED_ENTRY_INVALID_UB };
status
is used for handling different stages of a schedule entry. SEND and RECV entries will change their status from NOT_STARTED
to STARTED
, then to COMPLETE
. REDUCE
, COPY
and CB
entries will change directly from NOT_STARTED
to COMPLETE
in MPIDU_Sched_start_entry
.
enum MPIDU_Sched_entry_status { MPIDU_SCHED_ENTRY_STATUS_NOT_STARTED = 0, MPIDU_SCHED_ENTRY_STATUS_STARTED, MPIDU_SCHED_ENTRY_STATUS_COMPLETE, MPIDU_SCHED_ENTRY_STATUS_FAILED, /* indicates a failure occurred while executing the entry */ MPIDU_SCHED_ENTRY_STATUS_INVALID /* indicates an invalid entry, or invalid status value */ };
is_barrier
is used to control the calling order of the entries in a schedule. The entries marked as is_barrier
will not surpass its predecessor. It will control the behavior of a entry in function MPIDU_Sched_continue
and MPIDU_Sched_progress_state
.
MPIDU_Sched_start_entry
All types are called inside MPIDU_Sched_start_entry
.
It is called in the following order:
MPID_Sched_start MPIDU_Sched_continue MPIDU_Sched_start_entry
For entries with a SCHEDULE/COPY/CB type, their status will be changed from NOT_STARTED to COMPLETE after the call.
For entries with a SEND/RECV type, their status will be changed from NOT_STARTED to STARTED.
MPIDU_Sched_progress_state
Only SEND and RECV are called inside MPIDU_Sched_progress_state
because only the entries in these two types has three status instead of two, as mention before.
MPID_Sched_barrier
MPID_Sched_barrier
mark its predecessor entry's is_barrier
as 1.
Algorithm
The progress engine is a part of non-blocking calls to overlap computation and communication. In order to achieve overlap, a non-blocking calls only adds an schedule to the progress engine and returns immediately.
The question is: when is the progress engine called? See following examples:
-
MPI_Comm_idup
The code snippet is like this:
for (i = 0; i < NUM_ITER; i++) MPI_Comm_idup(MPI_COMM_WORLD, &comms[i], &req[i]) MPI_Waitall(NUM_ITER, req, MPI_STATUSES_INGORE);
When MPI_Comm_idup is called, it registers a callback funtion gcn_helper
in the progress engine. This function is not called until MPI_Waitall
is called. The call stack is:
gcn_helper MPIDU_Sched_start_entry MPIDU_Sched_continue MPIDU_Sched_progress_state MPIDU_Sched_progress MPIDI_CH3I_Progress MPIR_Waitall_impl MPI_Waitall