[gmx-users] Checkpointing

Tue Feb 2 16:35:21 CET 2010

We have mdrun integrated into our distributed computing project. When
your users suspend or close the manger it checkpoints, so when they
open again it continues mdrun where it left off. However, when users
reboot, it starts from the beginning. We are using this command line
to execute the work.

mdrun.exe (-v -x -c -o md.pdb -e -cpo md.next -cpi md.cpt -deffnm md)

I have a seperate checkpoint for output so after this simulation we
can extend the workunit. Should we try using this append option?

Checkpoints containing the complete state of the system are written at
regular intervals (option -cpt) to the file -cpo, unless option -cpt
is set to -1. A simulation can be continued by reading the full state
from file with option -cpi. This option is intelligent in the way that
if no checkpoint file is found, Gromacs just assumes a normal run and
starts from the first step of the tpr file.

With checkpointing you can also use the option -append to just
continue writing to the previous output files. This is not enabled by
default since it is potentially dangerous if you move files, but if
you just leave all your files in place and restart mdrun with exactly
the same command (with options -cpi and -append) the result will be
the same as from a single run. The contents will be binary identical
(unless you use dynamic load balancing), but for technical reasons
there might be some extra energy frames when using checkpointing
(necessary for restarts without appending).

-- 
Jack

http://drugdiscoveryathome.com
http://hydrogenathome.org