[gmx-users] Gromacs 5.1 and 5.1.1 crash in REMD
Krzysztof Kuczera
kkuczera at ku.edu
Tue Nov 17 18:11:21 CET 2015
Hi
I am trying to run a temperature-exchange REMD simulation with GROMACS
5.1 or 5.1.1
and my job is crashing in a way difficult to explain
- the MD part works fine
- crash occurs at first replica-exchange attempt
- error log contains a bunch of messages of type, which I suppose mean
that the MPI communication
did not work
NOTE: Turning on dynamic load balancingFatal error in MPI_Allreduce: A
process has failed, error stack:MPI_Allreduce(1421).......:
MPI_Allreduce(sbuf=0x7fff5538018c, rbuf=0x28b2070, count=3, MPI_FLOAT,
MPI_SUM, comm=0x84000002) failed
MPIR_Allreduce_impl(1262).:MPIR_Allreduce_intra(497).:
MPIR_Bcast_binomial(245)..:dequeue_and_set_error(917): Communication
error with rank 48Fatal error in MPI_Allreduce: Other MPI error, error
stack:MPI_Allreduce(1421)......: MPI_Allreduce(sbuf=0x7fff31eb660c,
rbuf=0x2852c00, count=3, MPI_FLOAT, MPI_SUM, comm=0x84000001)
failedMPIR_Allreduce_impl(1262):
MPIR_Allreduce_intra(497):
MPIR_Bcast_binomial(316).: Failure during collective
Fatal error in MPI_Allreduce: Other MPI error, error
stack:MPI_Allreduce(1421)......: MPI_Allreduce(sbuf=0x7fff2e54068c,
rbuf=0x31e35a0, count=3, MPI_FLOAT, MPI
_SUM, comm=0x84000001) failed
Recently compiled slightly older versions like 5.0.6 do not have this
behavior.
I have tried updating to latest cmake, compiler and MPI versions on our
system,
but it does not change things.
Does anyone have suggestions how to fix this?
Thanks
Krzysztof
--
Krzysztof Kuczera
Departments of Chemistry and Molecular Biosciences
The University of Kansas
1251 Wescoe Hall Drive, 5090 Malott Hall
Lawrence, KS 66045
Tel: 785-864-5060 Fax: 785-864-5396 email: kkuczera at ku.edu
http://oolung.chem.ku.edu/~kuczera/home.html
More information about the gromacs.org_gmx-users
mailing list