[gmx-users] Re: Possible bug: energy changes with the number of nodes for energy minimization

Stephen Cox stephen.cox.10 at ucl.ac.uk
Wed May 30 13:42:50 CEST 2012


Hi Justin,

Thanks for getting back and posting the links.


> On 5/29/12 6:22 AM, Stephen Cox wrote:
> > Hi,
> >
> > I'm running a number of energy minimizations on a clathrate supercell
> and I get
> > quite significantly different values for the total energy depending on
> the
> > number of mpi processes / number of threads I use. More specifically,
> some
> > numbers I get are:
> >
> > #cores      energy
> > 1        -2.41936409202696e+04
> > 2        -2.43726425776809e+04
> > 3        -2.45516442350804e+04
> > 4        -2.47003944216983e+04
> >
> > #threads    energy
> > 1        -2.41936409202696e+04
> > 2        -2.43726425776792e+04
> > 3        -2.45516442350804e+04
> > 4        -2.47306458924815e+04
> >
> >
> > I'd expect some numerical noise, but these differences seem to0 large
> for that.
>
> The difference is only 2%, which by MD standards, is quite good, I'd say ;)
> Consider the discussion here:
>

I agree for MD this wouldn't be too bad, but I'd expect energy minimization
to get very close to the same local minimum from a given starting
configuration. The thing is I want to compute a binding curve for my
clathrate and compare to DFT values for the binding energy (amongst other
things), and the difference in energy between different number of cores is
rather significant for this purpose.

Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a single
point energy for identical structures) I get the same trend as above (both
mpi/openmp with domain/particle decomposition). Surely there shouldn't be
such a large difference in energy for a single point calculation?


> http://www.gromacs.org/Documentation/Terminology/Reproducibility
>
> To an extent, the information here may also be relevant:
>
>
> http://www.gromacs.org/Documentation/How-tos/Extending_Simulations#Exact_vs_binary_identical_continuation
>
> > Before submitting a bug report, I'd like to check:
> > a) if someone has seen something similar;
>
> Sure.  Energies can be different due to a whole host of factors (discussed
> above), and MPI only complicates matters.
>
> > b) should I just trust the serial version?
>
> Maybe, but I don't know that there's evidence to say that any of the above
> tests
> are more or less accurate than the others.  What happens if you run with
> mdrun
> -reprod on all your tests?
>

Running with -reprod produces the same trend as above. If it was numerical
noise, I would have thought that the numbers would fluctuate around some
average value, not follow a definite trend where the energy decreases with
the number of cores/threads...


>
> > c) have I simply done something stupid (grompp.mdp appended below);
> >
>
> Nope, looks fine.
>
> -Justin
>
> Thanks again for getting back to me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120530/105dd023/attachment.html>


More information about the gromacs.org_gmx-users mailing list