Parallel domain decomposition for particle methods: Part 1
12 minute read
Introduction
For parallel particle codes that have to be written quickly (while retaining flexibility), the
task-based parallelism approach doesn’t always work well. The usual approach that is taken
in those situations is some sort of domain decomposition and a lot of associated fine-grained
code for communication between processes. One tries to strike the appropriate balance between
communication and computation while making sure that the computation is load-balanced. As a
rule of thumb, less communication is better.
One approach (among many) in particle-based codes that are being parallelized starting from
a serial version is:
The creation of the particles on the root/master processor.
Scattering the particles to various processes.
Communicating ghost regions at processor boundaries.
Migrating particles that have crossed processor boundaries to the appropriate process.
In the interest of simplicity, we ignore the communication of interparticle forces.
Creating and scattering particles
Particles are created on the master process (P0) and then transferred to other parallel
processes during the “scatter” operation. In the animation below, we assume that there
are nine processes - P0 through P8. The domain is decomposed into nine squares and the
contents of each square are sent to the appropriate process.
MPI implementation
A possible MPI implementation of the scattering process is described below. For convenience
we use the boost::mpi wrappers around MPI calls is most cases. However, some MPI calls
do not have associated Boost calls and we have to use the MPI calls directly.
MPI setup
The first step is to set up the MPI communicator and determine the rank (and MPI coordinates
in a virtual Cartesian topology) of the current process:
In the above, the IntVec class is an std::array<int, 3>.
The scatter operation
In the scatter operation, the particles are assigned to each patch and then
sent to the appropriate patches using the asynchronous isend operation:
Here ParticlePArray is a std::vector<ParticleP> and ParticleP is
a std::shared_ptr<Particle>. The Particle class contains particle
data. For simplicity, we do not consider the performance implications
of an array of structures (as used in this implementation) versus
a structure of arrays (which is more efficient).
Remarks
In the next part of this series, we will discuss two approaches for inter-patch communication
for particle-based simulations.
We have been translating a few Code-Aster verification test manuals into English.
The process is not just a straightforward translation of the text in the Fr...
The Code-Aster cooling tower modal analysis validation test
FORMA11c comes with a quadrilateral mesh provided by Code-Aster.
Tips on quadrilateral meshing wi...
In this article, I will discuss how elements can be selected from deep inside a 3D mesh
in the Salome-Meca environment and manipulated with Python scripts.
Introduction
One of the reasons I switched to cmake for my builds was the need to compile my
Vaango code on a BlueGene/Q system. The code was previously co...