[gmx-users] Multi-node GPU runs crashing with a fork() warning

Thu May 22 00:14:02 CEST 2014

Hey Folks,

I'm attempting to run simulations on a multi-node gpu cluster and my
simulations are crashing after flagging a open-mpi fork() warning:

*------------------------------------------------------------------------------------------*
*An MPI process has executed an operation involving a call to the*
*"fork()" system call to create a child process.  Open MPI is currently*
*operating in a condition that could result in memory corruption or*
*other system errors; your MPI job may hang, crash, or produce silent*
*data corruption.  The use of fork() (or system() or other calls that*
*create child processes) is strongly discouraged.*

*The process that invoked fork was:*

*  Local host:          lngpu019 (PID 11549)*
*  MPI_COMM_WORLD rank: 18*

*If you are *absolutely sure* that your application will successfully*
*and correctly survive a call to fork(), you may disable this warning*
*by setting the mpi_warn_on_fork MCA parameter to 0.*
*------------------------------------------------------------------------------------------*

I saw a similar mailing-list post about this sort of issue from September
2013, but the thread had no resolution.

   - Each node of our cluster has has 12 intel cores and 6 NVIDIA Tesla
   C2050 GPU's.

   - we call: mpirun -machinefile nodes.txt -npernode 6 mdrun_mpi

   - I compiled GROMACS on one of the compute nodes with the C2050's.

We also have a few nodes with newer K20 NVIDIA GPU's. When we compile
GROMACS on these nodes we can run the code across multiple nodes and GPU's
without any errors.

I don't know if the fork() error is directly related to the crash or not;
or if there might be obscure, device specific object files outside my build
directory, that I should delete. Any insight you folks could provide to
help me solve this issue would be appreciated.

Thanks,

-- 
Thomas O'Connor
Graduate Research Assistant
MCS IGERT Fellow

Department of Physics & Astronomy
The Johns Hopkins University
3701 San Martin Drive
Baltimore, MD 21218*toconnor at jhu.edu <toconnor at jhu.edu>*410.516.8587