[gmx-users] Same starting structure but different trajectories
gmx3 at hotmail.com
Wed Feb 17 11:55:04 CET 2010
Gromacs, including the MPI version, is completely deterministic,
except for, if I recall correctly, two algorithms.
One is PME with fftw as the FFT library. FFTW does timings of different algorithms
and uses the fastest one, the results will depend on the exact load of the machine
during the timing phase.
The other is the dynamic load balancing.
mdrun -reprod will try to turn off all these non-deterministic algorithms,
at the cost of performance. This will give you reproducable simulations
(if you run exactly the same tpr file on exactly the same system with exactly
the same binary and the same command line).
In most cases one is not interested in exact reproducability,
but in certain cases, noticably debugging, reproducability can be important.
> Date: Wed, 17 Feb 2010 11:42:39 +0100
> From: erikm at xray.bmc.uu.se
> To: gmx-users at gromacs.org
> Subject: Re: [gmx-users] Same starting structure but different trajectories
> The times when registers are pushed to the stack may happen at different
> points in the calculations in different runs, so the "bonus accuracy"
> may be lost at different times. As I mentioned, a hardware interrupt may
> (will) force such an event, and this may be due to some external
> trigger, such as a mouse click or a keystroke.
> Note that there are other sources of discrepancy than the one I
> mentioned (and I'm not sure if that's even a common thing to find in
> modern AMD or Intel cpus, it's just an example). If you run your
> simulation in parallel, then the order that the processeses finish
> certain tasks may affect the outcome of certain floating point
> additions. In general, for floats (A+B)+C != A+(B+C), so if you have
> three processes that coimpute e.g. the kinetic energy for parts of your
> system, and the values are summed up by the MPI library, then the order
> of the summation may depend on the order in which the processes finish.
> As for the numerical chaos. It is all about the long term sensitivity to
> small differences in the initial conditions. Errors at time t can be
> estimated by factor exp(lambda*t), where lambda is the Lyapunov
> exponent. If lambda is larger than one then your error will increase
> exponentially, however small the initial error,a nd the system is thus
> chaotic. For n-body problems, like the ones we have in MD, even the
> analytic solutions are often chaotic.
> In general, the hardware side of things is increadibly more complicated
> than what you may first think as a programmer.
> Hope that helps your understanding.
> Emanuel Peter skrev:
> > Dear Erik,
> > I still have some trouble to understand.
> > Could you give me a more detailed answer why I get two different
> > trajectories on the same machine? It should have the same registers and
> > numerical chaos should be exactly the same in that case.
> > Bests,
> > Emanuel
> >>>> Erik Marklund 02/17/10 10:05 AM >>>
> > Well, unfortunately it doesn't. Numerical differences are introduced
> > here and there for different resasons. On some machines the registers
> > have more bits than you'd normally fit into a float or double, and hence
> > give higher precision. If those registers are pushed to the stack, then
> > the trailing bits get lost. This can happen due to thread scheduling or
> > hardware interrupts. Such small differences between runs will over time
> > build up to arbitrarily large differences because of the (numerically)
> > chaotic nature of MD simulations. When a program is parallelized even
> > more sources of such differences arise, since it's e.g. not known
> > beforehand in which order certain floats/doubles are added together,
> > which in turn may have a small effect on the results.
> > /Erik
> > Emanuel Peter skrev:
> >> Hi Tsjerk,
> >> I ran my job on the same machine and made exactly the same inputstructure and
> >> inputfile.
> >> What does chaos mean in your opinion? A computer should do exactly the same
> >> every time you start with the same starting conditions on the same machine.
> >> I have to couple my ligand seperately and not as part of my protein, so I can't
> >> couple just two groups named Protein and Non-Protein.
> >> Bests,
> >> Emanuel
> >>>>> Tsjerk Wassenaar 02/17/10 9:40 AM >>>
> >> Hi Emanuel,
> >> Anything small can cause trajectories to diverge, it's chaos. The
> >> machine on which is run can make the difference. This is also covered
> >> in the list archives.
> >>> ;Temperature coupling
> >>> tcoupl = nose-hoover
> >>> tc-grps = protein NA+ CFP SOL
> >>> tau_t = 0.1 0.1 0.1 0.1
> >>> ref_t = 300 300 300 300
> >> This is bad practice. Where did you get that from? Not from any _good_
> >> tutorial, surely. Check
> >> http://www.gromacs.org/Documentation/Terminology/Thermostats to learn
> >> more.
> >> Cheers,
> >> Tsjerk
> Erik Marklund, PhD student
> Laboratory of Molecular Biophysics,
> Dept. of Cell and Molecular Biology, Uppsala University.
> Husargatan 3, Box 596, 75124 Uppsala, Sweden
> phone: +46 18 471 4537 fax: +46 18 511 755
> erikm at xray.bmc.uu.se http://xray.bmc.uu.se/molbiophys
> gmx-users mailing list gmx-users at gromacs.org
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Express yourself instantly with MSN Messenger! Download today it's FREE!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-users