[gmx-users] mdrun -rerun does not reproduce itself
Szilárd Páll
pall.szilard at gmail.com
Wed Apr 27 00:51:07 CEST 2016
On Tue, Apr 26, 2016 at 10:48 PM, Mark Abraham <mark.j.abraham at gmail.com>
wrote:
> Hi,
>
> Thanks (yet again) for your vigilance!
>
> On Tue, Apr 26, 2016 at 7:44 PM Christopher Neale <
> chris.neale at alum.utoronto.ca> wrote:
>
> > Dear Users:
> >
> > I find that running gromacs 5.1.2 mdrun -rerun many times gives different
> > energies for some components.
>
>
> Indeed. This is the intended behaviour.
>
> Running floating-point code on different hardware is essentially guaranteed
> to be non-reproducible, because the order of operations will be different.
> In particular, accumulating hundreds of force components happens in a
> different order, and that can cause different round-off effects. You can
> certainly rotate your box 90 degrees and get a different energy. You can
> minimize the size of the effect by running in double precision (which rules
> out GPUs and gives up about half the speed on CPUs). We could re-write the
> code to accumulate in fixed precision (as AMBER now does), which would give
> reproducibility, subject to whatever rounding happens in practice (but this
> is useless except for verification, or permitting certain kinds of
> optimizations). Or you can run with mdrun -reprod, which is intended to
> eliminate all the differences that are possible. (IIRC, that is not
> implemented on GPUs, either.) All these implementations compute different
> approximations to the same numerical result. The open question is what
> quality of approximation is acceptable in practice. Merely implementing
> reproducibility doesn't address the question at all. The answer is unclear
> - one is typically anyway accepting error in
> * whatever choice for the order of force and energy floating-point
> reductions the code uses in practice
> * how likely it is that particle pairs not in the short-range list might
> drift inside the short-range interaction buffer (but this can't happen in
> reruns)
> * how faithfully PME models the full periodic electrostatics (ewald_rtol,
> FFT grid size, and spline interpolation order matter here),
> * the approximate enforcement of holonomic bond constraints (typical both
> for SHAKE and LINCS), and
> * using a model physics with static point partial charges (etc.)
> so the answer has to be empirical - at what point do you start seeing
> relevant differences in observables, and how much additional computer time
> (= reduced sampling) are you prepared to pay to correct that? It does at
> first seem reasonable to require that total energy (or the appropriate
> conserved quantity for the integrator) is in fact conserved over the course
> of the integration, but one still has to decide what tolerance is
> acceptable, and I'm aware of no study that makes a defensible conclusion...
>
> On GPUs I find that its the SR interactions (both LJ and q) that are
> > inconsistent.
>
>
> I believe that's because the scheduling of short-ranged kernels (and thus
> the order of the accumulation) is up to the CUDA runtime (but maybe Szilard
> will correct me).
>
The reason is the use of atomic operations (on floating point values) in
force accumulation of the short-range GPU kernels.
There are well-known ways around it and deterministic accumulation has been
on our todo list, but given other higher priority features and the plans to
rework the kernel infrastructure, it may be some time before this happens
-- unless somebody is keen on pitching in!
Cheers,
--
Szilárd
>
> On CPU only, I find that its the Coul. recip. that is inconsistent.
>
>
> That is likely caused by FFTW's auto-tuning deciding on a different codelet
> because whatever machine activity happens alongside mdrun was higher or
> lower each run. This is suppressed by mdrun -reprod.
>
>
> > On CPU only with NPME=1 I find consistency but I am surprised still by
> how
> > much the coulombic SR depends on the approach (see differences in average
> > value between the 3 approaches). Note that the issue here is not that the
> > rerun energies differ from the runtime energies, but that mdrun rerun is
> > not entirely reproducible compared to itself.
>
>
> It's expected, normal, and intended. If there's a problem someone can show,
> then we can discuss it. :-)
>
> Also note that differences in the Coul. SR but not the LJ SR changes are
> > reflected in the last column, which is the total potential energy...
> > strange, and possibly off-topic to this post, but I note it in case some
> of
> > these differences might just be output related.
>
>
> Single-precision floating-point arithmetic is only accurate to about one
> part in 10^-7, so the round-off accumulated in the component (Coul. SR) of
> largest magnitude tends to dominate the round-off in the total.
>
>
> > There's also the difference between rows 5 and 6, where none of the
> > potential energy components changes, but the total does ch
> > ange.
> >
>
> I can't think of a reason for that one.
>
>
> >
> > ### [GPU TEST] Note the differences down column 8 (LJ SR) and column 9 (q
> > SR)
> >
> > $ for((i=1;i<=10;i++)); do
> > ~/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme
> -dlb
> > yes -npme 0 -cpt 60 -gpu_id 0123 -ntmpi 4 -ntomp 6 -rerun TEMP.xtc -s
> > MD2.tpr -deffnm SAME >/dev/null 2>&1; echo "1 2 3 4 5 6 7 8 9 10" | gmx
> > energy -f SAME.edr -o epot_SAME.xvg -xvg none >/dev/null 2>&1; tail -n 1
> > epot_SAME.xvg; done
> >
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.945312 -932386.625000 4761.907227
> > -819692.500000
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.947266 -932386.750000 4761.907227
> > -819692.625000
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.945312 -932386.750000 4761.907227
> > -819692.625000
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.943359 -932386.750000 4761.907227
> > -819692.625000
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.945312 -932386.750000 4761.907227
> > -819692.625000
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.945312 -932386.750000 4761.907227
> > -819692.687500
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.945312 -932386.750000 4761.907227
> > -819692.625000
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.945312 -932386.687500 4761.907227
> > -819692.562500
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.945312 -932386.750000 4761.907227
> > -819692.625000
> > 2000.000000 22256.701172 121366.734375 69411.359375 813.758789
> > 12650.896484 -141897.171875 23329.943359 -932386.625000 4761.907227
> > -819692.500000
> >
> >
> > ### [CPU TEST] Note the differences down column 9 (Coul. recip.).
> However,
> > note that Coul. recip. changes are again not reflected in the last
> column,
> > which is the total potential energy.
> >
>
> Yeah, they're smaller than the precision available in the final result.
>
>
> > $ for((i=1;i<=10;i++)); do
> > ~/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -dlb
> yes
> > -npme 0 -cpt 60 -ntmpi 4 -ntomp 6 -rerun TEMP.xtc -s MD2.tpr -deffnm SAME
> > >/dev/null 2>&1; echo "1 2 3 4 5 6 7 8 9 10" | gmx energy -f SAME.edr -o
> > epot_SAME.xvg -xvg none >/dev/null 2>&1; tail -n 1 epot_SAME.xvg; done
> >
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.906738
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.906738
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> > 2000.000000 22256.705078 121366.718750 69411.343750 813.758789
> > 12650.895508 -141897.234375 23329.876953 -932379.437500 4761.907227
> > -819685.437500
> >
> >
> > ### One way that seems to keep things identical is to run on CPUs only,
> > but to define NPME=1 (see below) though it again is inconsistent with
> > NPME=2 (not shown).
> >
>
> Sure, the order of the arithmetic is guaranteed to be different between
> different -npme settings. Otherwise, you were probably just getting lucky
> with the FFTW autotuning.
>
>
> > $ for((i=1;i<=10;i++)); do
> > ~/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -dlb
> yes
> > -npme 1 -cpt 60 -ntmpi 4 -ntomp 6 -rerun TEMP.xtc -s MD2.tpr -deffnm SAME
> > >/dev/null 2>&1; echo "1 2 3 4 5 6 7 8 9 10" | gmx energy -f SAME.edr -o
> > epot_SAME.xvg -xvg none >/dev/null 2>&1; tail -n 1 epot_SAME.xvg; done
> >
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> > 2000.000000 22256.707031 121366.617188 69411.320312 813.758850
> > 12650.896484 -141897.250000 23329.882812 -932365.500000 4761.785156
> > -819671.812500
> >
> > So I guess that the solution to getting reproducible energies from mdrun
> > -rerun is to avoid GPUs and to use NPME=1.
> >
>
> But reproducibility says nothing about any kind of accuracy or precision.
>
>
> > However, I am quite surprised by the magnitude of the variance in the
> > coulombic SR energies depending on the architecture (and these
> differences
> > are reflected in the total potential energy):
> >
> > coul SR on GPUs: -932386
> > coul SR on CPUs with NPME=0: -932379
> > coul SR on CPUs with NPME=1: -932365
> >
>
> It's normal to be surprised, but that's consistent with the normal range of
> variation. Note that if you chose ewald_rtol = 10^-5, then you chose that
> ratio of relative strength of the short- and long-range interactions at the
> cutoff, so a bunch of weaker short-range interactions at longer range are
> being arbitrarily omitted. That's an approximation, too, and its quality
> depends on the homogeneity of your system.
>
>
> > I certainly hope that a 21 kJ/mol difference in potential energy is not
> > actually occurring during runs on CPU vs. GPU (or a 14 kJ/mol difference
> > even staying on CPUs depending on whether NPME=1 or 0).
> >
>
> The difference is real. Why would you feel confident in output that was
> textually identical? :-) It probably is, when run in double precision, but
> that doesn't mean anything particular.
>
> Another take on this is that room-temperature RT is 2.5 kJ/mol. The typical
> size of thermal fluctuations will depend on the number of degrees of
> freedom, but surely it's larger than 21 kJ/mol.
>
> (The implementation of mdrun -rerun looks like a dog's breakfast, but I
> believe that is unrelated to your observations here. See
> http://redmine.gromacs.org/issues/1868)
>
> Mark
>
> Hopefully I've just missed something obvious here.
> > I appreciate any insight.
> >
> > Thank you,
> > Chris.
> >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list