[gmx-users] Re: protein unstable for parallel job while stable for serial one

Mark Abraham Mark.Abraham at anu.edu.au
Tue Sep 19 17:12:10 CEST 2006

Akansha Saxena wrote:
> This is what I was doing. I was running exactly
> identical simulations on 1 processor and on 16
> processors. 
> By identical i mean - same starting structure,
> velocities taken from the same *.trr file. The only
> difference was the number of nodes for the production
> run. 

Well this sounds sensible, so long as you weren't doing an erroneous 
gen_vel = yes.

> But I give the same velocities and use exactly same
> starting structure for both simulations. Basically I
> use the same files for both cases. Only difference
> lying in the number of processors. 
> I would think that with same intial conditions the
> calculations should be identical for both cases.   

Real-world floating point computations are not algebraic computations. 
You can divide a number n by x, and add the result to itself x times, 
and a test for equality against n will fail, for sufficiently 
pathological n and x. The order in which summation occurs when you have 
a mixture of large and small numbers can also affect the result through 
accumulated round-off errors. A parallel computation will effectively be 
doing this. The fact this happens is not actually a problem - the 
perturbation is not so large you are sampling a different ensemble. You 
just don't have algebraic reproducibility.


More information about the gromacs.org_gmx-users mailing list