[gmx-developers] Checkpoints & REMD
David van der Spoel
spoel at xray.bmc.uu.se
Wed Sep 7 09:29:35 CEST 2011
Hi,
I have been bitten by this problem before:
[neolith1:native/REMD] % ls -l *cpt
-rw-r--r-- 1 x_davva x_davva 635388 Sep 5 23:18 native10.cpt
-rw-r--r-- 1 x_davva x_davva 635388 Sep 5 23:18 native10_prev.cpt
-rw-r--r-- 1 x_davva x_davva 0 Sep 5 23:18 native11.cpt
-rw-r--r-- 1 x_davva x_davva 0 Sep 5 23:18 native11_prev.cpt
and now it happened again, using gmx 4.5.1 (for consistency). It seems
like the checkpoint code is not REMD or multisim aware, and hence the
code to check for the existence of xxx_prev.cpt is not sufficient.
It seems that this problem happens due to the fact that my jobs are
chained in the queueing system, and will restart a new job even if the
previous job crashed. Hence the problem might be prevented by adding
extensive checks in the script for existence of cpt files and
consistency of those.
Nevertheless it should be quite simple to introduce a multisim check in
the cpt code before the previous version is erased. Looking at the
latest (release-4-5-patches) source code this does not seem to be present.
Cheers,
--
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
spoel at xray.bmc.uu.se http://folding.bmc.uu.se
More information about the gromacs.org_gmx-developers
mailing list