[gmx-users] replica restart from checkpoints

Berk Hess gmx3 at hotmail.com
Fri Feb 20 10:07:57 CET 2009


Hi,

I guess that actually the -maxh procedure might be the problem in your case.
If all replicas stop correctly after -maxh, they will all be between the same exchange events,
so it should work.
The only issue I can see is that one (or more) replica reaches an exchange attempt step
early and waits for communication, while the others are late and get stopped by -maxh.
Have you checked that the simulation terminated properly?

If this is the case, currently the only solution is not to use -maxh,
but to make tpr files with nsteps short enough to finish in time and then use tpbconv
to extend the tpr files (without trajectory and energy) and then run mdrun -cpi.

Berk

From: massimiliano.bonomi at gmail.com
To: gmx-users at gromacs.org
Subject: Re: [gmx-users] replica restart from checkpoints
Date: Thu, 19 Feb 2009 22:47:23 +0100

Thanks for your reply...
Which version are you using?
In 4.0.3 I made things slightly better by allowing checkpoints
to have different step numbers, as long as they fall within
the same exchange attempt steps.

I'm using 4.0.3. Same problem with the former versions 4.0.x.
This could still cause problems when the steps in the checkpoints
differ very much. But if you use -maxh all simulations should finish
closely within each other.
Actually I'm using -maxh!

(you can always go back to using tpbconv)


Unfortunately I have no trr files, but just xtc with only solute...
Synchronizing the checkpoint writing is a bit complicated
and will probably only be done in 4.1.

Is it not possible to define the writing stride in terms of MD steps?
Thanks again,Massimiliano
Berk

> From: massimiliano.bonomi at gmail.com
> To: gmx-users at gromacs.org
> Date: Thu, 19 Feb 2009 20:14:15 +0100
> Subject: [gmx-users] replica restart from checkpoints
> 
> Dear Gromacs Users,
> 
> I'm experiencing some problems when restarting a replica exchange run 
> from previous checkpoint files.
> It often happens to me that the number of MD steps done in the 
> previous run is not the
> same for all the replica. If this is the case, the program stops.
> This may happen since checkpoints are written with a stride expressed
> in REAL time (every 15 minutes) and replica on different processors 
> may have run
> for different number of steps in the same amount of time.
> 
> Is it possible to specify the checkpoint writing stride in number of 
> steps instead of real time?
> 
> Regards,
> Massimiliano Bonomi
> _______________________________________________
> gmx-users mailing list gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Express yourself instantly with MSN Messenger! MSN Messenger _______________________________________________
gmx-users mailing list    gmx-users at gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-request at gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

_________________________________________________________________
What can you do with the new Windows Live? Find out
http://www.microsoft.com/windows/windowslive/default.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090220/4ee0f591/attachment.html>


More information about the gromacs.org_gmx-users mailing list