Difference between revisions of "MPI+Argobots"

From Mpich
Jump to: navigation, search
Line 6: Line 6:
  
 
However, the two-level parallelism of MPI+X introduces new problems such as lock contention in MPI between threads. To avoid unnecessary locks between execution units, MPI+Argobots will explicitly control the context switch between User Level Threads (ULT) and Execution Streams (ES). When switching between ULTs in the same ES, no lock is needed.
 
However, the two-level parallelism of MPI+X introduces new problems such as lock contention in MPI between threads. To avoid unnecessary locks between execution units, MPI+Argobots will explicitly control the context switch between User Level Threads (ULT) and Execution Streams (ES). When switching between ULTs in the same ES, no lock is needed.
 
== New Thread Level for ULT: MPI_THREAD_ULT ==
 
We propose another thread level for MPI and thread integration: MPI_THREAD_ULT. In this level, there will be only one ES per process and multiple ULTs in the ES. Because ULTs do not execute concurrently, so there is no lock needed when enter or exit MPI calls. On the other side, when yielding, the current ULT will yield to other ULTs in the same ES, compared to yielding to other ESs with MPI_THREAD_MULTIPLE.
 
 
<pre>
 
MPI_Init_thread(&argc, &argv, MPI_THREAD_ULT, &provided);
 
</pre>
 
  
 
== Build MPI+Argobots ==
 
== Build MPI+Argobots ==
Line 25: Line 18:
 
| git://git.mpich.org/mpich-dev.git
 
| git://git.mpich.org/mpich-dev.git
 
|}
 
|}
 +
 +
To contribute to Argobots and MPI+Argobots, please contact [http://www.mcs.anl.gov/~balaji/ Dr. Pavan Balaji].
  
 
=== Build Argobots ===
 
=== Build Argobots ===
Line 87: Line 82:
  
 
This is a template for MPI+Argobots applications. Note <code>ABT_finalize</code> must be called after <code>MPI_Finalize</code>, because MPI+Argobots uses Argobots calls inside MPI, so Argobots should not be finalized before MPI. Also, some users may need to use Argobots calls after finalizing MPI so Argobots needs to be finalized manually by users.
 
This is a template for MPI+Argobots applications. Note <code>ABT_finalize</code> must be called after <code>MPI_Finalize</code>, because MPI+Argobots uses Argobots calls inside MPI, so Argobots should not be finalized before MPI. Also, some users may need to use Argobots calls after finalizing MPI so Argobots needs to be finalized manually by users.
 +
 +
== New Thread Level for ULT: MPIX_THREAD_ULT ==
 +
We propose another thread level [2] for MPI and thread integration: MPIX_THREAD_ULT. In this level, there will be only one ES per process and multiple ULTs in the ES. Because ULTs do not execute concurrently, so there is no lock needed when enter or exit MPI calls. On the other side, when yielding, the current ULT will yield to other ULTs in the same ES, compared to yielding to other ESs with MPI_THREAD_MULTIPLE.
 +
 +
<pre>
 +
MPI_Init_thread(&argc, &argv, MPIX_THREAD_ULT, &provided);
 +
</pre>
 +
 +
== References ==
 +
1. Argobots Home, https://collab.mcs.anl.gov/display/ARGOBOTS/Argobots+Home
 +
 +
2. Huiwei Lu, Sangmin Seo, and Pavan Balaji. MPI+ULT: Overlapping Communication and Computation with User-Level Threads. The 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC '15), New York, USA, August 24-26, 2015.

Revision as of 14:37, 27 August 2015

As core number of many-core processors keeps increasing, MPI+X is becoming a promising programming model for large scale SMP clusters. It has the potential to utilizing both intra-node and inter-node parallelism with appropriate execution unit and granularity.

Argobots is a low-level threading/task infrastructure developed by a joint effort of Argonne National Laboratory, University of Illinois at Urbana-Champaign, University of Tennessee, Knoxville and Pacific Northwest National Laboratory. It provides a lightweight execution model that combines low-latency thread and task scheduling with optimized data-movement functionality.

A benefit of Argobots is providing asynchrony/overlap to MPI. The idea is to make multiple MPI blocking calls at the same time in multiple ULTs, if one MPI call is blocked in ULT A, MPI runtime will detect it and context switch to another ULT to make progress on other blocking calls. Once other ULTs finished their execution, they will switch back to ULT A to continue its execution. In this way, we can keep the CPU busy doing useful work instead of waiting the blocking call.

However, the two-level parallelism of MPI+X introduces new problems such as lock contention in MPI between threads. To avoid unnecessary locks between execution units, MPI+Argobots will explicitly control the context switch between User Level Threads (ULT) and Execution Streams (ES). When switching between ULTs in the same ES, no lock is needed.

Build MPI+Argobots

Git repos:

Argobot read-only clone URL git://git.mcs.anl.gov/argo/argobots.git
mpich-dev read-only clone URL git://git.mpich.org/mpich-dev.git

To contribute to Argobots and MPI+Argobots, please contact Dr. Pavan Balaji.

Build Argobots

Follow the instructions in https://collab.mcs.anl.gov/display/ARGOBOTS/Getting+and+Building to build Argobots.

$ export INSTALL_PATH=/path/to/install
$ git clone --origin argobots git://git.mcs.anl.gov/argo/argobots.git argobots
$ cd argobots
$ ./autogen.sh
$ ./configure --prefix=$INSTALL_PATH
$ make -j 4
$ make install

Build MPICH

MPI+Argobots is currently under develop in mpich-dev repository. To get the source code, do

$ git clone --origin mpich-dev git://git.mpich.org/mpich-dev.git mpich-dev
$ cd mpich-dev
$ git checkout mpi-argobots

Set paths to link Argobots library.

export LD_LIBRARY_PATH=$INSTALL_PATH/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=$INSTALL_PATH/lib:$LIBRARY_PATH
export C_INCLUDE_PATH=$INSTALL_PATH/include:$C_INCLUDE_PATH

Compile.

$ ./autogen.sh
$ CFLAGS="-I$INSTALL_PATH/include" ./configure --prefix=$INSTALL_PATH --enable-threads=multiple --with-thread-package=argobots
$ make -j 8
$ make install

Because Argobots supports both excution stream (ES) and user level thread (ULT), when compiling MPICH, "--enable-threads=multiple" is used. When executing, you can choose whether or not multiple ESs are needed by choosing the thread level MPIX_THREAD_ULT or MPI_THREAD_MULTIPLE in MPI_Init_thread. MPIX_THREAD_ULT means there will only be one ES per process and multiple ULTs in the ES. MPI_THREAD_MULTIPLE means there will no restriction of ES and ULT.

Build and Run MPI+Argobots Examples

Set path to use the newly install mpicc and mpiexec.

export PATH=$INSTALL_PATH/bin:$PATH

Run examples.

cd mpich-dev/test/mpi/threads/argobots
make
mpiexec -n 2 ./hello_abt

Example of MPI+Argobots

ABT_init();
MPI_Init_thread();
/* Argobots calls */
MPI_Finalize();
ABT_finalize();

This is a template for MPI+Argobots applications. Note ABT_finalize must be called after MPI_Finalize, because MPI+Argobots uses Argobots calls inside MPI, so Argobots should not be finalized before MPI. Also, some users may need to use Argobots calls after finalizing MPI so Argobots needs to be finalized manually by users.

New Thread Level for ULT: MPIX_THREAD_ULT

We propose another thread level [2] for MPI and thread integration: MPIX_THREAD_ULT. In this level, there will be only one ES per process and multiple ULTs in the ES. Because ULTs do not execute concurrently, so there is no lock needed when enter or exit MPI calls. On the other side, when yielding, the current ULT will yield to other ULTs in the same ES, compared to yielding to other ESs with MPI_THREAD_MULTIPLE.

MPI_Init_thread(&argc, &argv, MPIX_THREAD_ULT, &provided);

References

1. Argobots Home, https://collab.mcs.anl.gov/display/ARGOBOTS/Argobots+Home

2. Huiwei Lu, Sangmin Seo, and Pavan Balaji. MPI+ULT: Overlapping Communication and Computation with User-Level Threads. The 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC '15), New York, USA, August 24-26, 2015.