[gmx-users] Re: protein unstable for parallel job while stable for serial one
Tsjerk Wassenaar
tsjerkw at gmail.com
Tue Sep 19 23:30:51 CEST 2006
Hi Akansha (and Mark),
The approach taken is indeed sensible, and in addition to that I fully
support the comments made by Mark. It's actually part of what I meant,
changing the number of processors gives you a difference between your
simulations. However, it should not give you a consistent difference.
That is, if you perform, say 100 simulations on 1 processor and 100 on
16, the results from one set should be similar to those obtained from
the other. In casu, you should observe the conformational change
approximately an equal number of times. The same would hold for full
convergence of your system. In that case, the number of transitions
(folding/unfolding, conformational change) should be approximately
equal over a certain time and have the same rate, duration, etc.
You'll only be able to get that for small systems.
It is possible that the division over multiple processors introduces
some artefacts, which was in fact found by Jelger Risselada some while
ago (you can check the mailinglist archive). But I think this bug was
fixed. You might be able to detect such a consistent effect in the
results if you perform say five simulations on 1 processor and five on
16 processors, using different starting velocities.
I hope this helps you a bit further.
Best regards,
Tsjerk
On 9/19/06, Mark Abraham <Mark.Abraham at anu.edu.au> wrote:
> Akansha Saxena wrote:
> > This is what I was doing. I was running exactly
> > identical simulations on 1 processor and on 16
> > processors.
> > By identical i mean - same starting structure,
> > velocities taken from the same *.trr file. The only
> > difference was the number of nodes for the production
> > run.
>
> Well this sounds sensible, so long as you weren't doing an erroneous
> gen_vel = yes.
>
> > But I give the same velocities and use exactly same
> > starting structure for both simulations. Basically I
> > use the same files for both cases. Only difference
> > lying in the number of processors.
> > I would think that with same intial conditions the
> > calculations should be identical for both cases.
>
> Real-world floating point computations are not algebraic computations.
> You can divide a number n by x, and add the result to itself x times,
> and a test for equality against n will fail, for sufficiently
> pathological n and x. The order in which summation occurs when you have
> a mixture of large and small numbers can also affect the result through
> accumulated round-off errors. A parallel computation will effectively be
> doing this. The fact this happens is not actually a problem - the
> perturbation is not so large you are sampling a different ensemble. You
> just don't have algebraic reproducibility.
>
> Mark
> _______________________________________________
> gmx-users mailing list gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>
--
Tsjerk A. Wassenaar, Ph.D.
Groningen Biomolecular Sciences and Biotechnology Institute (GBB)
Dept. of Biophysical Chemistry
University of Groningen
Nijenborgh 4
9747AG Groningen, The Netherlands
+31 50 363 4336
More information about the gromacs.org_gmx-users
mailing list