[gmx-users] Re: Possible bug: energy changes with the number of nodes for energy minimization

Mark Abraham Mark.Abraham at anu.edu.au
Wed May 30 13:52:06 CEST 2012

On 30/05/2012 9:42 PM, Stephen Cox wrote:
> Hi Justin,
> Thanks for getting back and posting the links.
>     On 5/29/12 6:22 AM, Stephen Cox wrote:
>     > Hi,
>     >
>     > I'm running a number of energy minimizations on a clathrate
>     supercell and I get
>     > quite significantly different values for the total energy
>     depending on the
>     > number of mpi processes / number of threads I use. More
>     specifically, some
>     > numbers I get are:
>     >
>     > #cores      energy
>     > 1        -2.41936409202696e+04
>     > 2        -2.43726425776809e+04
>     > 3        -2.45516442350804e+04
>     > 4        -2.47003944216983e+04
>     >
>     > #threads    energy
>     > 1        -2.41936409202696e+04
>     > 2        -2.43726425776792e+04
>     > 3        -2.45516442350804e+04
>     > 4        -2.47306458924815e+04
>     >
>     >
>     > I'd expect some numerical noise, but these differences seem to0
>     large for that.
>     The difference is only 2%, which by MD standards, is quite good,
>     I'd say ;)
>     Consider the discussion here:
> I agree for MD this wouldn't be too bad, but I'd expect energy 
> minimization to get very close to the same local minimum from a given 
> starting configuration. The thing is I want to compute a binding curve 
> for my clathrate and compare to DFT values for the binding energy 
> (amongst other things), and the difference in energy between different 
> number of cores is rather significant for this purpose.

Given the usual roughness of the PE surface to which you are minimizing, 
some variation in end point is expected.

> Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a 
> single point energy for identical structures) I get the same trend as 
> above (both mpi/openmp with domain/particle decomposition). Surely 
> there shouldn't be such a large difference in energy for a single 
> point calculation?

nsteps = 0 is not strictly a single-point energy, since the constraints 
act before EM step 0. mdrun -s -rerun will give a single point. This 
probably won't change your observations. You should also be sure you're 
making observations with the latest release (4.5.5).

If you can continue to observe this trend for more processors 
(overallocating?), then you may have evidence of a problem - but a full 
system description and an .mdp file will be in order also.


>     http://www.gromacs.org/Documentation/Terminology/Reproducibility
>     To an extent, the information here may also be relevant:
>     http://www.gromacs.org/Documentation/How-tos/Extending_Simulations#Exact_vs_binary_identical_continuation
>     > Before submitting a bug report, I'd like to check:
>     > a) if someone has seen something similar;
>     Sure.  Energies can be different due to a whole host of factors
>     (discussed
>     above), and MPI only complicates matters.
>     > b) should I just trust the serial version?
>     Maybe, but I don't know that there's evidence to say that any of
>     the above tests
>     are more or less accurate than the others.  What happens if you
>     run with mdrun
>     -reprod on all your tests?
> Running with -reprod produces the same trend as above. If it was 
> numerical noise, I would have thought that the numbers would fluctuate 
> around some average value, not follow a definite trend where the 
> energy decreases with the number of cores/threads...
>     > c) have I simply done something stupid (grompp.mdp appended below);
>     >
>     Nope, looks fine.
>     -Justin
> Thanks again for getting back to me.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120530/a4ed4a18/attachment.html>

More information about the gromacs.org_gmx-users mailing list