[gmx-users] mdrun -rerun does not reproduce itself

Mark Abraham mark.j.abraham at gmail.com
Tue Apr 26 22:48:30 CEST 2016


Hi,

Thanks (yet again) for your vigilance!

On Tue, Apr 26, 2016 at 7:44 PM Christopher Neale <
chris.neale at alum.utoronto.ca> wrote:

> Dear Users:
>
> I find that running gromacs 5.1.2 mdrun -rerun many times gives different
> energies for some components.


Indeed. This is the intended behaviour.

Running floating-point code on different hardware is essentially guaranteed
to be non-reproducible, because the order of operations will be different.
In particular, accumulating hundreds of force components happens in a
different order, and that can cause different round-off effects. You can
certainly rotate your box 90 degrees and get a different energy. You can
minimize the size of the effect by running in double precision (which rules
out GPUs and gives up about half the speed on CPUs). We could re-write the
code to accumulate in fixed precision (as AMBER now does), which would give
reproducibility, subject to whatever rounding happens in practice (but this
is useless except for verification, or permitting certain kinds of
optimizations). Or you can run with mdrun -reprod, which is intended to
eliminate all the differences that are possible. (IIRC, that is not
implemented on GPUs, either.) All these implementations compute different
approximations to the same numerical result. The open question is what
quality of approximation is acceptable in practice. Merely implementing
reproducibility doesn't address the question at all. The answer is unclear
- one is typically anyway accepting error in
* whatever choice for the order of force and energy floating-point
reductions the code uses in practice
* how likely it is that particle pairs not in the short-range list might
drift inside the short-range interaction buffer (but this can't happen in
reruns)
* how faithfully PME models the full periodic electrostatics (ewald_rtol,
FFT grid size, and spline interpolation order matter here),
* the approximate enforcement of holonomic bond constraints (typical both
for SHAKE and LINCS), and
* using a model physics with static point partial charges (etc.)
so the answer has to be empirical - at what point do you start seeing
relevant differences in observables, and how much additional computer time
(= reduced sampling) are you prepared to pay to correct that? It does at
first seem reasonable to require that total energy (or the appropriate
conserved quantity for the integrator) is in fact conserved over the course
of the integration, but one still has to decide what tolerance is
acceptable, and I'm aware of no study that makes a defensible conclusion...

On GPUs I find that its the SR interactions (both LJ and q) that are
> inconsistent.


I believe that's because the scheduling of short-ranged kernels (and thus
the order of the accumulation) is up to the CUDA runtime (but maybe Szilard
will correct me).

On CPU only, I find that its the Coul. recip. that is inconsistent.


That is likely caused by FFTW's auto-tuning deciding on a different codelet
because whatever machine activity happens alongside mdrun was higher or
lower each run. This is suppressed by mdrun -reprod.


> On CPU only with NPME=1 I find consistency but I am surprised still by how
> much the coulombic SR depends on the approach (see differences in average
> value between the 3 approaches). Note that the issue here is not that the
> rerun energies differ from the runtime energies, but that mdrun rerun is
> not entirely reproducible compared to itself.


It's expected, normal, and intended. If there's a problem someone can show,
then we can discuss it. :-)

Also note that differences in the Coul. SR but not the LJ SR  changes are
> reflected in the last column, which is the total potential energy...
> strange, and possibly off-topic to this post, but I note it in case some of
> these differences might just be output related.


Single-precision floating-point arithmetic is only accurate to about one
part in 10^-7, so the round-off accumulated in the component (Coul. SR) of
largest magnitude tends to dominate the round-off in the total.


> There's also the difference between rows 5 and 6, where none of the
> potential energy components changes, but the total does ch
>  ange.
>

I can't think of a reason for that one.


>
> ### [GPU TEST] Note the differences down column 8 (LJ SR) and column 9 (q
> SR)
>
> $ for((i=1;i<=10;i++)); do
> ~/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme -dlb
> yes -npme 0 -cpt 60 -gpu_id 0123 -ntmpi 4 -ntomp 6 -rerun TEMP.xtc -s
> MD2.tpr -deffnm SAME >/dev/null 2>&1; echo "1 2 3 4 5 6 7 8 9 10" | gmx
> energy -f SAME.edr -o epot_SAME.xvg -xvg none >/dev/null 2>&1; tail -n 1
> epot_SAME.xvg; done
>
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.945312  -932386.625000  4761.907227
> -819692.500000
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.947266  -932386.750000  4761.907227
> -819692.625000
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.945312  -932386.750000  4761.907227
> -819692.625000
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.943359  -932386.750000  4761.907227
> -819692.625000
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.945312  -932386.750000  4761.907227
> -819692.625000
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.945312  -932386.750000  4761.907227
> -819692.687500
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.945312  -932386.750000  4761.907227
> -819692.625000
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.945312  -932386.687500  4761.907227
> -819692.562500
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.945312  -932386.750000  4761.907227
> -819692.625000
>  2000.000000  22256.701172  121366.734375  69411.359375  813.758789
> 12650.896484  -141897.171875  23329.943359  -932386.625000  4761.907227
> -819692.500000
>
>
> ### [CPU TEST] Note the differences down column 9 (Coul. recip.). However,
> note that Coul. recip. changes are again not reflected in the last column,
> which is the total potential energy.
>

Yeah, they're smaller than the precision available in the final result.


> $ for((i=1;i<=10;i++)); do
> ~/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -dlb yes
> -npme 0 -cpt 60 -ntmpi 4 -ntomp 6 -rerun TEMP.xtc -s MD2.tpr -deffnm SAME
> >/dev/null 2>&1; echo "1 2 3 4 5 6 7 8 9 10" | gmx energy -f SAME.edr -o
> epot_SAME.xvg -xvg none >/dev/null 2>&1; tail -n 1 epot_SAME.xvg; done
>
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.906738
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.906738
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>  2000.000000  22256.705078  121366.718750  69411.343750  813.758789
> 12650.895508  -141897.234375  23329.876953  -932379.437500  4761.907227
> -819685.437500
>
>
> ### One way that seems to keep things identical is to run on CPUs only,
> but to define NPME=1 (see below) though it again is inconsistent with
> NPME=2 (not shown).
>

Sure, the order of the arithmetic is guaranteed to be different between
different -npme settings. Otherwise, you were probably just getting lucky
with the FFTW autotuning.


> $ for((i=1;i<=10;i++)); do
> ~/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -dlb yes
> -npme 1 -cpt 60 -ntmpi 4 -ntomp 6 -rerun TEMP.xtc -s MD2.tpr -deffnm SAME
> >/dev/null 2>&1; echo "1 2 3 4 5 6 7 8 9 10" | gmx energy -f SAME.edr -o
> epot_SAME.xvg -xvg none >/dev/null 2>&1; tail -n 1 epot_SAME.xvg; done
>
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>  2000.000000  22256.707031  121366.617188  69411.320312  813.758850
> 12650.896484  -141897.250000  23329.882812  -932365.500000  4761.785156
> -819671.812500
>
> So I guess that the solution to getting reproducible energies from mdrun
> -rerun is to avoid GPUs and to use NPME=1.
>

But reproducibility says nothing about any kind of accuracy or precision.


> However, I am quite surprised by the magnitude of the variance in the
> coulombic SR energies depending on the architecture (and these differences
> are reflected in the total potential energy):
>
> coul SR on GPUs: -932386
> coul SR on CPUs  with NPME=0: -932379
> coul SR on CPUs with NPME=1: -932365
>

It's normal to be surprised, but that's consistent with the normal range of
variation. Note that if you chose ewald_rtol = 10^-5, then you chose that
ratio of relative strength of the short- and long-range interactions at the
cutoff, so a bunch of weaker short-range interactions at longer range are
being arbitrarily omitted. That's an approximation, too, and its quality
depends on the homogeneity of your system.


> I certainly hope that a 21 kJ/mol difference in potential energy is not
> actually occurring during runs on CPU vs. GPU (or a 14 kJ/mol difference
> even staying on CPUs depending on whether NPME=1 or 0).
>

The difference is real. Why would you feel confident in output that was
textually identical? :-) It probably is, when run in double precision, but
that doesn't mean anything particular.

Another take on this is that room-temperature RT is 2.5 kJ/mol. The typical
size of thermal fluctuations will depend on the number of degrees of
freedom, but surely it's larger than 21 kJ/mol.

(The implementation of mdrun -rerun looks like a dog's breakfast, but I
believe that is unrelated to your observations here. See
http://redmine.gromacs.org/issues/1868)

Mark

Hopefully I've just missed something obvious here.
> I appreciate any insight.
>
> Thank you,
> Chris.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list