As core number of CPU processors keeps increasing, MPI+X becomes a promising programming model for large scale SMP clusters. It has the potential to utilizing both intra-node and inter-node parallelism with appropriate execution unit and granularity.
Argobots is a low-level threading/task infrastructure developed by a joint effort of Argonne National Laboratory, University of Illinois at Urbana-Champaign, The University of Tennessee, Knoxville and Pacific Northwest National Laboratory. It provides a lightweight execution model that combines low-latency thread and task scheduling with optimized data-movement functionality.
However, the two-level parallelism of MPI+X introduces new problems such as lock contention in MPI between threads. To avoid unnecessary locks between execution units, MPI+Argobots will explicitly control the context switch between User Level Threads (ULT) and Execution Streams (ES). When switching between ULTs in the same ES, no lock is needed.
Another benefit of using Argobots with MPI is to provide asynchrony/overlap with ULTs. The idea is to make multiple MPI blocking calls at the same time in multiple ULTs, if one MPI call is blocked in ULT A, MPI runtime will detect it and context switch to another ULT to make progress on other blocking calls. Once other ULTs finished their execution, they will switch back to ULT A to continue its execution. In this way, we can keep the CPU busy doing useful work instead of waiting the blocking call.