[gmx-users] Re: protein unstable for parallel job while stable for serial one
Mark Abraham
Mark.Abraham at anu.edu.au
Tue Sep 19 17:12:10 CEST 2006
Akansha Saxena wrote:
> This is what I was doing. I was running exactly
> identical simulations on 1 processor and on 16
> processors.
> By identical i mean - same starting structure,
> velocities taken from the same *.trr file. The only
> difference was the number of nodes for the production
> run.
Well this sounds sensible, so long as you weren't doing an erroneous
gen_vel = yes.
> But I give the same velocities and use exactly same
> starting structure for both simulations. Basically I
> use the same files for both cases. Only difference
> lying in the number of processors.
> I would think that with same intial conditions the
> calculations should be identical for both cases.
Real-world floating point computations are not algebraic computations.
You can divide a number n by x, and add the result to itself x times,
and a test for equality against n will fail, for sufficiently
pathological n and x. The order in which summation occurs when you have
a mixture of large and small numbers can also affect the result through
accumulated round-off errors. A parallel computation will effectively be
doing this. The fact this happens is not actually a problem - the
perturbation is not so large you are sampling a different ensemble. You
just don't have algebraic reproducibility.
Mark
More information about the gromacs.org_gmx-users
mailing list