[gmx-users] Re: Possible bug: energy changes with the number of nodes for energy minimization

Szilárd Páll szilard.pall at cbr.su.se
Fri Jun 1 11:16:45 CEST 2012


Or you can just use the git version.
--
Szilárd


On Wed, May 30, 2012 at 5:51 PM, Stephen Cox <stephen.cox.10 at ucl.ac.uk> wrote:
> Hi Justin and Mark,
>
> Thanks once again for getting back.
>
> I've found the problem - it's actually a known bug already:
>
> http://redmine.gromacs.org/issues/901
>
> The dispersion correction is multiplied my the number of processes (I found
> this after taking a closer look at my md.log files to see where the energy
> was changing)! I guess this means I should use the serial version for
> meaningful binding energies. It also looks like it will be fixed for version
> 4.5.6
>
> Thank you again, I really appreciate your help.
>
> Steve
>
>
>>
>> On 30/05/2012 9:42 PM, Stephen Cox wrote:
>> > Hi Justin,
>> >
>> > Thanks for getting back and posting the links.
>> >
>> >
>> >     On 5/29/12 6:22 AM, Stephen Cox wrote:
>> >     > Hi,
>> >     >
>> >     > I'm running a number of energy minimizations on a clathrate
>> >     supercell and I get
>> >     > quite significantly different values for the total energy
>> >     depending on the
>> >     > number of mpi processes / number of threads I use. More
>> >     specifically, some
>> >     > numbers I get are:
>> >     >
>> >     > #cores      energy
>> >     > 1        -2.41936409202696e+04
>> >     > 2        -2.43726425776809e+04
>> >     > 3        -2.45516442350804e+04
>> >     > 4        -2.47003944216983e+04
>> >     >
>> >     > #threads    energy
>> >     > 1        -2.41936409202696e+04
>> >     > 2        -2.43726425776792e+04
>> >     > 3        -2.45516442350804e+04
>> >     > 4        -2.47306458924815e+04
>> >     >
>> >     >
>> >     > I'd expect some numerical noise, but these differences seem to0
>> >     large for that.
>> >
>> >     The difference is only 2%, which by MD standards, is quite good,
>> >     I'd say ;)
>> >     Consider the discussion here:
>> >
>> >
>> > I agree for MD this wouldn't be too bad, but I'd expect energy
>> > minimization to get very close to the same local minimum from a given
>> > starting configuration. The thing is I want to compute a binding curve
>> > for my clathrate and compare to DFT values for the binding energy
>> > (amongst other things), and the difference in energy between different
>> > number of cores is rather significant for this purpose.
>>
>> Given the usual roughness of the PE surface to which you are minimizing,
>> some variation in end point is expected.
>>
>>
>> >
>>
>> > Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a
>> > single point energy for identical structures) I get the same trend as
>> > above (both mpi/openmp with domain/particle decomposition). Surely
>> > there shouldn't be such a large difference in energy for a single
>> > point calculation?
>>
>> nsteps = 0 is not strictly a single-point energy, since the constraints
>> act before EM step 0. mdrun -s -rerun will give a single point. This
>> probably won't change your observations. You should also be sure you're
>> making observations with the latest release (4.5.5).
>>
>>
>> If you can continue to observe this trend for more processors
>> (overallocating?), then you may have evidence of a problem - but a full
>> system description and an .mdp file will be in order also.
>>
>> Mark
>
>
>> >
>> >
>> >     http://www.gromacs.org/Documentation/Terminology/Reproducibility
>> >
>> >     To an extent, the information here may also be relevant:
>> >
>> >
>> > http://www.gromacs.org/Documentation/How-tos/Extending_Simulations#Exact_vs_binary_identical_continuation
>> >
>> >     > Before submitting a bug report, I'd like to check:
>> >     > a) if someone has seen something similar;
>> >
>> >     Sure.  Energies can be different due to a whole host of factors
>> >     (discussed
>> >     above), and MPI only complicates matters.
>> >
>> >     > b) should I just trust the serial version?
>> >
>> >     Maybe, but I don't know that there's evidence to say that any of
>> >     the above tests
>> >     are more or less accurate than the others.  What happens if you
>> >     run with mdrun
>> >     -reprod on all your tests?
>> >
>> >
>> > Running with -reprod produces the same trend as above. If it was
>> > numerical noise, I would have thought that the numbers would fluctuate
>> > around some average value, not follow a definite trend where the
>> > energy decreases with the number of cores/threads...
>> >
>> >
>> >     > c) have I simply done something stupid (grompp.mdp appended
>> > below);
>> >     >
>> >
>> >     Nope, looks fine.
>> >
>> >     -Justin
>> >
>> > Thanks again for getting back to me.
>> >
>> >
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://lists.gromacs.org/pipermail/gmx-users/attachments/20120530/a4ed4a18/attachment-0001.html
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 30 May 2012 07:51:02 -0400
>> From: "Justin A. Lemkul" <jalemkul at vt.edu>
>> Subject: Re: [gmx-users] Re: Possible bug: energy changes with the
>>        number  of      nodes for energy minimization
>> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>> Message-ID: <4FC609A6.4090702 at vt.edu>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>>
>>
>>
>> On 5/30/12 7:42 AM, Stephen Cox wrote:
>> > Hi Justin,
>> >
>> > Thanks for getting back and posting the links.
>> >
>> >
>> >     On 5/29/12 6:22 AM, Stephen Cox wrote:
>> >      > Hi,
>> >      >
>> >      > I'm running a number of energy minimizations on a clathrate
>> > supercell and
>> >     I get
>> >      > quite significantly different values for the total energy
>> > depending on the
>> >      > number of mpi processes / number of threads I use. More
>> > specifically, some
>> >      > numbers I get are:
>> >      >
>> >      > #cores      energy
>> >      > 1        -2.41936409202696e+04
>> >      > 2        -2.43726425776809e+04
>> >      > 3        -2.45516442350804e+04
>> >      > 4        -2.47003944216983e+04
>> >      >
>> >      > #threads    energy
>> >      > 1        -2.41936409202696e+04
>> >      > 2        -2.43726425776792e+04
>> >      > 3        -2.45516442350804e+04
>> >      > 4        -2.47306458924815e+04
>> >      >
>> >      >
>> >      > I'd expect some numerical noise, but these differences seem to0
>> > large for
>> >     that.
>> >
>> >     The difference is only 2%, which by MD standards, is quite good, I'd
>> > say ;)
>> >     Consider the discussion here:
>> >
>> >
>> > I agree for MD this wouldn't be too bad, but I'd expect energy
>> > minimization to
>> > get very close to the same local minimum from a given starting
>> > configuration.
>> > The thing is I want to compute a binding curve for my clathrate and
>> > compare to
>> > DFT values for the binding energy (amongst other things), and the
>> > difference in
>> > energy between different number of cores is rather significant for this
>> > purpose.
>> >
>>
>> I think the real issue comes down to how you're going to calculate binding
>> energy.  I would still expect that with sufficient MD sampling, the
>> differences
>> should be small or statistically insignificant given the nature of MD
>> calculations.  EM will likely be very sensitive to the nature of how it is
>> run
>> (MPI vs. serial, etc) since even the tiny rounding errors and other
>> factors
>> described below will cause changes in how the EM algorithm proceeds.  For
>> most
>> purposes, such differences are irrelevant as EM is only a preparatory step
>> for
>> more intense calculations.
>
>
>>
>> > Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a
>> > single point
>> > energy for identical structures) I get the same trend as above (both
>> > mpi/openmp
>> > with domain/particle decomposition). Surely there shouldn't be such a
>> > large
>> > difference in energy for a single point calculation?
>> >
>>
>> That depends.  Are you using the same .mdp file, just setting "nsteps =
>> 0"?  If
>> so, that's not a good test.  EM algorithms will make a change at step 0,
>> the
>> magnitude of which will again reflect the differences you're seeing.  If
>> you use
>> the md integrator with a zero-step evaluation, that's a better test.
>
>
>>
>> -Justin
>>
>>
>
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list