[gmx-users] Gromacs 5.1 and 5.1.1 crash in REMD

Krzysztof Kuczera kkuczera at ku.edu
Tue Nov 17 18:11:21 CET 2015


Hi
I am trying to run a temperature-exchange REMD simulation with GROMACS 
5.1 or 5.1.1
and my job is crashing in a way difficult to explain
- the MD part works fine
- crash occurs at first replica-exchange attempt
- error log contains a bunch of messages of type, which I suppose mean 
that the MPI communication
    did not work

NOTE: Turning on dynamic load balancingFatal error in MPI_Allreduce: A 
process has failed, error stack:MPI_Allreduce(1421).......: 
MPI_Allreduce(sbuf=0x7fff5538018c, rbuf=0x28b2070, count=3, MPI_FLOAT, 
MPI_SUM, comm=0x84000002) failed
MPIR_Allreduce_impl(1262).:MPIR_Allreduce_intra(497).:
MPIR_Bcast_binomial(245)..:dequeue_and_set_error(917): Communication 
error with rank 48Fatal error in MPI_Allreduce: Other MPI error, error 
stack:MPI_Allreduce(1421)......: MPI_Allreduce(sbuf=0x7fff31eb660c, 
rbuf=0x2852c00, count=3, MPI_FLOAT, MPI_SUM, comm=0x84000001) 
failedMPIR_Allreduce_impl(1262):
MPIR_Allreduce_intra(497):
MPIR_Bcast_binomial(316).: Failure during collective
Fatal error in MPI_Allreduce: Other MPI error, error 
stack:MPI_Allreduce(1421)......: MPI_Allreduce(sbuf=0x7fff2e54068c, 
rbuf=0x31e35a0, count=3, MPI_FLOAT, MPI
_SUM, comm=0x84000001) failed


Recently compiled slightly older versions like 5.0.6 do not have this 
behavior.
I have tried updating to latest cmake, compiler and MPI versions on our 
system,
but it does not change things.
Does anyone have suggestions how to fix this?

Thanks
Krzysztof

-- 
Krzysztof Kuczera
Departments of Chemistry and Molecular Biosciences
The University of Kansas
1251 Wescoe Hall Drive, 5090 Malott Hall
Lawrence, KS 66045
Tel: 785-864-5060 Fax: 785-864-5396 email: kkuczera at ku.edu
http://oolung.chem.ku.edu/~kuczera/home.html



More information about the gromacs.org_gmx-users mailing list