[gmx-users] System Blowing up when more than one MPI thread is used

Mayank Vats vatsm at rpi.edu
Fri Mar 22 15:43:57 CET 2019


Hi,
I am trying to do a simple simulation in GROMCAS 2018.6, of a protein in
tip3p water with the amber99 forcefield. Its a 14nm cubic box with two
protein chains, and 87651 water molecules, 269525 atoms total. I am able to
energy minimise to an Fmax < 500, and then perform nvt and npt
equilibriation for 100ps each at 300K and 1 bar. I am facing issues in the
production run (dt=2fs, for 5ns).
A little bit about the hardware i'm using: My local workstation has 8 cpu
cores and 1 gpu. I'm also connecting to a Power9 system, where i have
access to 1 node with 160 cpu cores and 4 gpus. I have built the
gromacs version for the P9 system with GMX_OPENMP_MAX_THREADS=192, so it
can run using more than the default 64 threads.
I do not have a clear understanding of the way mpi, ranks etc work, which
is why I'm here.
Now the issue, with information from the log files:
On my *local workstation*, i run *gmx mdrun* and it uses 1 mpi thread, 8
openMP threads, and the one gpu available is assigned two gpu tasks:
------------------------------------------------------------------------------------------------------------
Using 1 MPI thread
Using 8 OpenMP threads

1 GPU auto-selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
   PP:0,PME:0
------------------------------------------------------------------------------------------------------------
This runs and completes successfully.
On the *P9 system*, where i have one node with the configuration mentioned
above, i have run *gmx_mpi mdrun* which uses 1 mpi thread, 160 openMP
threads and 1 gpu as follows:
------------------------------------------------------------------------------------------------------------
Using 1 MPI thread
Using 160 OpenMP threads

1 GPU auto-selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
   PP:0,PME:0
------------------------------------------------------------------------------------------------------------
This two completes successfully.
However, i wanted to be able to use all the available gpus for the
simulation on the *P9 system*, and tried *gmx mdrun* which gave this in the
log file:
------------------------------------------------------------------------------------------------------------
Using 32 MPI threads
Using 5 OpenMP threads per tMPI thread

On host bgrs02 4 GPUs auto-selected for this run.
Mapping of GPU IDs to the 32 GPU tasks in the 32 ranks on this node:

PP:0,PP:0,PP:0,PP:0,PP:0,PP:0,PP:0,PP:0,PP:1,PP:1,PP:1,PP:1,PP:1,PP:1,PP:1,PP:1,PP:2,PP:2,PP:2,PP:2,PP:2,PP:2,PP:2,PP:2,PP:3,PP:3,PP:3,PP:3,PP:3,PP:3,PP:3,PP:3
------------------------------------------------------------------------------------------------------------
This gives me warnings saying:
"One or more water molecules cannot be settled.
Check for bad contacts and/or reduce the timestep if appropriate."
"LINCS warnings.."
and finally exits with this:
------------------------------------------------------------------------------------------------------------
Program:     gmx mdrun, version 2018.6
Source file: src/gromacs/ewald/pme-redistribute.cpp (line 282)
MPI rank:    12 (out of 32)

Fatal error:
891 particles communicated to PME rank 12 are more than 2/3 times the
cut-off
out of the domain decomposition cell of their charge group in dimension x.
This usually means that your system is not well equilibrated.
------------------------------------------------------------------------------------------------------------
I looked up the error on the documentation as suggested, and it says that
this an indication of the system blowing up. I don't understand how the
same input configuration completes production successfully when using one
MPI thread, but fails and gives a 'blowing up' message when using more MPI
threads.
Are there arguments that i should be using? Or should i build the P9
Gromacs in a different way? Or is it an artifact of my system itself, and
if so, any suggestions on what to change?
I can provide more information if needed. Any help will be appreciated.
Thanks,
Mayank


More information about the gromacs.org_gmx-users mailing list