[gmx-developers] parallelization-dependent crashes in CVS Gromacs

Berk Hess hessb at mpip-mainz.mpg.de
Thu Jun 26 16:25:35 CEST 2008


Hi,

I have found and fixed the problem.

Some non-bonded interactions could be missing with 1D domain decomposition
with triclinic cells with components b(x) and c(y) both non-zero.
This is the case for truncated octahedrons, but not for rhombic 
dodecahedrons.

Berk.

Peter Kasson wrote:
> I've been encountering crashes in CVS gromacs depending on the 
> parallelization.  If I run a CVS-up-to-date mdrun (or any of various 
> snapshots within the past two months), I get a rapid crash as detailed 
> below.  However, if I run with either -rdd 1.6 [somewhat arbitrarily 
> chosen], -dd 2 2 1, or single-processor mdrun the  job completes 
> successfully.  I have also had errors on similar small test systems 
> where the error is a nsgrid failure (again on parallel but not 
> single-processor).  Larger test systems have been working ok for me, 
> although I haven't tried just replicating this box.
>
> Any ideas?  Has anyone else encountered something similar?
> (If you want input files, drop me a line.)
> Thanks,
> --Peter
>
> mpiexec -np 4 /array10/software/gmx/src/kernel/mdrun -v -dlb -deffnm 
> frame0
>
> [...]
>
> Reading file frame0.tpr, VERSION 3.3.99_development_20080208 (single 
> precision)
> Note: tpx file_version 54, software version 56
> Loaded with Money
>
> Making 1D domain decomposition 4 x 1 x 1
>
> starting mdrun 'CMG in water'
> 1000000 steps,   4000.0 ps.
> step 0
>
> t = 0.144 ps: Water molecule starting at atom 28978 can not be settled.
> Check for bad contacts and/or reduce the timestep.
>
> Back Off! I just backed up step36b_n1.pdb to ./#step36b_n1.pdb.1#
>
> Back Off! I just backed up step36c_n1.pdb to ./#step36c_n1.pdb.1#
> Wrote pdb files with previous and current coordinates
>
> t = 0.148 ps: Water molecule starting at atom 28978 can not be settled.
> Check for bad contacts and/or reduce the timestep.
>
> Back Off! I just backed up step37b_n1.pdb to ./#step37b_n1.pdb.1#
>
> Back Off! I just backed up step37c_n1.pdb to ./#step37c_n1.pdb.1#
> Wrote pdb files with previous and current coordinates
>
> t = 0.152 ps: Water molecule starting at atom 22435 can not be settled.
> Check for bad contacts and/or reduce the timestep.
>
> Back Off! I just backed up step38b_n1.pdb to ./#step38b_n1.pdb.1#
>
> Back Off! I just backed up step38c_n1.pdb to ./#step38c_n1.pdb.1#
> Wrote pdb files with previous and current coordinates
>
> t = 0.156 ps: Water molecule starting at atom 9895 can not be settled.
> Check for bad contacts and/or reduce the timestep.
>
> Back Off! I just backed up step39b_n0.pdb to ./#step39b_n0.pdb.1#
>
> t = 0.156 ps: Water molecule starting at atom 22435 can not be settled.
> Check for bad contacts and/or reduce the timestep.
>
> Back Off! I just backed up step39b_n1.pdb to ./#step39b_n1.pdb.1#
>
> Back Off! I just backed up step39c_n1.pdb to ./#step39c_n1.pdb.1#
>
> Back Off! I just backed up step39c_n0.pdb to ./#step39c_n0.pdb.1#
> Wrote pdb files with previous and current coordinates
> Wrote pdb files with previous and current coordinates
>
> t = 0.160 ps: Water molecule starting at atom 9895 can not be settled.
> Check for bad contacts and/or reduce the timestep.
>
> Back Off! I just backed up step40b_n0.pdb to ./#step40b_n0.pdb.1#
>
> t = 0.160 ps: Water molecule starting at atom 39988 can not be settled.
> Check for bad contacts and/or reduce the timestep.
>
> Back Off! I just backed up step40b_n1.pdb to ./#step40b_n1.pdb.1#
>
> Back Off! I just backed up step40c_n1.pdb to ./#step40c_n1.pdb.1#
>
> Back Off! I just backed up step40c_n0.pdb to ./#step40c_n0.pdb.1#
> Wrote pdb files with previous and current coordinates
> Wrote pdb files with previous and current coordinates
>
> -------------------------------------------------------
> Program mdrun, VERSION 3.3.99_development_200800503
> Source code file: pme.c, line: 510
>
> Fatal error:
> 1 particles communicated to PME node 1 are more than a cell length out 
> of the domain decomposition cell of their charge group
> -------------------------------------------------------
>
>
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use thewww 
> interface or send it to gmx-developers-request at gromacs.org.
>
> This email was Anti Virus checked by Astaro Security Gateway. 
> http://www.astaro.com
>




More information about the gromacs.org_gmx-developers mailing list