[gmx-users] Re: Possible bug: energy changes with the number of nodes for energy minimization
Stephen Cox
stephen.cox.10 at ucl.ac.uk
Wed May 30 17:51:46 CEST 2012
Hi Justin and Mark,
Thanks once again for getting back.
I've found the problem - it's actually a known bug already:
http://redmine.gromacs.org/issues/901
The dispersion correction is multiplied my the number of processes (I found
this after taking a closer look at my md.log files to see where the energy
was changing)! I guess this means I should use the serial version for
meaningful binding energies. It also looks like it will be fixed for
version 4.5.6
Thank you again, I really appreciate your help.
Steve
> On 30/05/2012 9:42 PM, Stephen Cox wrote:
> > Hi Justin,
> >
> > Thanks for getting back and posting the links.
> >
> >
> > On 5/29/12 6:22 AM, Stephen Cox wrote:
> > > Hi,
> > >
> > > I'm running a number of energy minimizations on a clathrate
> > supercell and I get
> > > quite significantly different values for the total energy
> > depending on the
> > > number of mpi processes / number of threads I use. More
> > specifically, some
> > > numbers I get are:
> > >
> > > #cores energy
> > > 1 -2.41936409202696e+04
> > > 2 -2.43726425776809e+04
> > > 3 -2.45516442350804e+04
> > > 4 -2.47003944216983e+04
> > >
> > > #threads energy
> > > 1 -2.41936409202696e+04
> > > 2 -2.43726425776792e+04
> > > 3 -2.45516442350804e+04
> > > 4 -2.47306458924815e+04
> > >
> > >
> > > I'd expect some numerical noise, but these differences seem to0
> > large for that.
> >
> > The difference is only 2%, which by MD standards, is quite good,
> > I'd say ;)
> > Consider the discussion here:
> >
> >
> > I agree for MD this wouldn't be too bad, but I'd expect energy
> > minimization to get very close to the same local minimum from a given
> > starting configuration. The thing is I want to compute a binding curve
> > for my clathrate and compare to DFT values for the binding energy
> > (amongst other things), and the difference in energy between different
> > number of cores is rather significant for this purpose.
>
> Given the usual roughness of the PE surface to which you are minimizing,
> some variation in end point is expected.
>
> >
> > Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a
> > single point energy for identical structures) I get the same trend as
> > above (both mpi/openmp with domain/particle decomposition). Surely
> > there shouldn't be such a large difference in energy for a single
> > point calculation?
>
> nsteps = 0 is not strictly a single-point energy, since the constraints
> act before EM step 0. mdrun -s -rerun will give a single point. This
> probably won't change your observations. You should also be sure you're
> making observations with the latest release (4.5.5).
>
> If you can continue to observe this trend for more processors
> (overallocating?), then you may have evidence of a problem - but a full
> system description and an .mdp file will be in order also.
>
> Mark
>
> >
> > http://www.gromacs.org/Documentation/Terminology/Reproducibility
> >
> > To an extent, the information here may also be relevant:
> >
> >
> http://www.gromacs.org/Documentation/How-tos/Extending_Simulations#Exact_vs_binary_identical_continuation
> >
> > > Before submitting a bug report, I'd like to check:
> > > a) if someone has seen something similar;
> >
> > Sure. Energies can be different due to a whole host of factors
> > (discussed
> > above), and MPI only complicates matters.
> >
> > > b) should I just trust the serial version?
> >
> > Maybe, but I don't know that there's evidence to say that any of
> > the above tests
> > are more or less accurate than the others. What happens if you
> > run with mdrun
> > -reprod on all your tests?
> >
> >
> > Running with -reprod produces the same trend as above. If it was
> > numerical noise, I would have thought that the numbers would fluctuate
> > around some average value, not follow a definite trend where the
> > energy decreases with the number of cores/threads...
> >
> >
> > > c) have I simply done something stupid (grompp.mdp appended below);
> > >
> >
> > Nope, looks fine.
> >
> > -Justin
> >
> > Thanks again for getting back to me.
> >
> >
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.gromacs.org/pipermail/gmx-users/attachments/20120530/a4ed4a18/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Wed, 30 May 2012 07:51:02 -0400
> From: "Justin A. Lemkul" <jalemkul at vt.edu>
> Subject: Re: [gmx-users] Re: Possible bug: energy changes with the
> number of nodes for energy minimization
> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> Message-ID: <4FC609A6.4090702 at vt.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
>
> On 5/30/12 7:42 AM, Stephen Cox wrote:
> > Hi Justin,
> >
> > Thanks for getting back and posting the links.
> >
> >
> > On 5/29/12 6:22 AM, Stephen Cox wrote:
> > > Hi,
> > >
> > > I'm running a number of energy minimizations on a clathrate
> supercell and
> > I get
> > > quite significantly different values for the total energy
> depending on the
> > > number of mpi processes / number of threads I use. More
> specifically, some
> > > numbers I get are:
> > >
> > > #cores energy
> > > 1 -2.41936409202696e+04
> > > 2 -2.43726425776809e+04
> > > 3 -2.45516442350804e+04
> > > 4 -2.47003944216983e+04
> > >
> > > #threads energy
> > > 1 -2.41936409202696e+04
> > > 2 -2.43726425776792e+04
> > > 3 -2.45516442350804e+04
> > > 4 -2.47306458924815e+04
> > >
> > >
> > > I'd expect some numerical noise, but these differences seem to0
> large for
> > that.
> >
> > The difference is only 2%, which by MD standards, is quite good, I'd
> say ;)
> > Consider the discussion here:
> >
> >
> > I agree for MD this wouldn't be too bad, but I'd expect energy
> minimization to
> > get very close to the same local minimum from a given starting
> configuration.
> > The thing is I want to compute a binding curve for my clathrate and
> compare to
> > DFT values for the binding energy (amongst other things), and the
> difference in
> > energy between different number of cores is rather significant for this
> purpose.
> >
>
> I think the real issue comes down to how you're going to calculate binding
> energy. I would still expect that with sufficient MD sampling, the
> differences
> should be small or statistically insignificant given the nature of MD
> calculations. EM will likely be very sensitive to the nature of how it is
> run
> (MPI vs. serial, etc) since even the tiny rounding errors and other factors
> described below will cause changes in how the EM algorithm proceeds. For
> most
> purposes, such differences are irrelevant as EM is only a preparatory step
> for
> more intense calculations.
>
> > Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a
> single point
> > energy for identical structures) I get the same trend as above (both
> mpi/openmp
> > with domain/particle decomposition). Surely there shouldn't be such a
> large
> > difference in energy for a single point calculation?
> >
>
> That depends. Are you using the same .mdp file, just setting "nsteps =
> 0"? If
> so, that's not a good test. EM algorithms will make a change at step 0,
> the
> magnitude of which will again reflect the differences you're seeing. If
> you use
> the md integrator with a zero-step evaluation, that's a better test.
>
> -Justin
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120530/d95107da/attachment.html>
More information about the gromacs.org_gmx-users
mailing list