[gmx-developers] Problem with simulation in 8 nodes
Berk Hess
hessb at mpip-mainz.mpg.de
Mon Aug 4 14:51:46 CEST 2008
Hi,
This looks like a bug.
Can you mail me the tpr file?
Berk.
Jose Duarte wrote:
> I'm running a simulation with the latest version of gromacs from CVS.
> My protein is 90 residues long, I add waters and ions as usual and
> then perform energy minimization, position restrained equilibration
> and a molecular dynamics run. This all works perfectly fine on 1 cpu
> (standard mdrun executable) on 4 and on 6 (using mdrun_mpi) but
> misteriously fails on 8 cpus. I've tried this on several setups: using
> lam-mpi in linux on a single multi-core box, using lam-mpi on several
> nodes of a cluster, using open-mpi on a multi-core Mac. I'm always
> getting exactly the same behaviour: all works fine on 4 or 6 cpus but
> fails on 8. Gromacs is compiled with default parameters (single
> precision).
>
> The problem itself comes in the energy minimization step. I run a
> pretty standard EM with PME for electrostatic interactions. This is
> the error message when running mdrun:
>
> ##########
> Making 2D domain decomposition 4 x 2 x 1
> Steepest Descents:
> Tolerance (Fmax) = 1.00000e+01
> Number of steps = 5000
>
> A list of missing interactions:
> G96Angle of 1304 missing -1
> Proper Dih. of 510 missing -2
> Improper Dih. of 409 missing -1
>
> -------------------------------------------------------
> Program mdrun_mpi, VERSION 3.3.99_development_20080718
> Source code file: domdec_top.c, line: 88
>
> Software inconsistency error:
> Some interactions seem to be assigned multiple times
>
> -------------------------------------------------------
>
> Error on node 3, will try to stop all the nodes
> Halting parallel program mdrun_mpi on CPU 3 out of 8
> ##########
>
>
>
> The one thing I notice different in this case compare to running on 4
> or 6 cpus is that in those cases the domain decomposition is 1D
> instead of 2D, no idea if that's relevant.
>
> Actually looking at the log file produced by mdrun the simulation
> seems to run properly until step 403, after which this error is reported:
>
>
> ##########
> Not all bonded interactions have been properly assigned to the domain
> decomposition cells
>
> A list of missing interactions:
> G96Angle of 1304 missing -1
> Proper Dih. of 510 missing -2
> Improper Dih. of 409 missing -1
> ##########
>
>
> I have also tried to run the same procedure on another protein but the
> problem doesn't arise at all, so it seems to be related to that
> particular protein. I can send the pdb file if that's helpful.
>
> Any ideas? Is this a bug?
>
> Thanks
>
> Jose
>
>
>
> Jose M. Duarte
> Max Planck Institute for Molecular Genetics
> Ihnestr. 63-73
> 14195 Berlin
> Germany
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-developers-request at gromacs.org.
>
> This email was Anti Virus checked by Astaro Security Gateway.
> http://www.astaro.com
>
More information about the gromacs.org_gmx-developers
mailing list