[gmx-developers] Checkpoints & REMD
hess at cbr.su.se
Wed Sep 7 09:56:51 CEST 2011
Could you try the fix below.
diff --git a/src/kernel/mdrun.c b/src/kernel/mdrun.c
index 8878331..7b1b396 100644
@@ -595,6 +595,11 @@ int main(int argc,char *argv)
sim_part = sim_part_fn + 1;
+ if (MULTISIM(cr))
+ check_multi_int(stdout,cr->ms,sim_part,"simulation part");
On 09/07/2011 09:48 AM, Berk Hess wrote:
> The 0 size files are a general checkpointing, or better: file append
> mode opening, bug,
> which has been fixed for 4.5.5. There was another fix in an
> intermediate version,
> but in the current release-4-5-patches it should be completely fixed.
> Or are you referring to the problem that mdrun reads checkpoints for
> some, but not all
> replicas and does not realize this?
> That should indeed be fixed.
> On 09/07/2011 09:29 AM, David van der Spoel wrote:
>> I have been bitten by this problem before:
>> [neolith1:native/REMD] % ls -l *cpt
>> -rw-r--r-- 1 x_davva x_davva 635388 Sep 5 23:18 native10.cpt
>> -rw-r--r-- 1 x_davva x_davva 635388 Sep 5 23:18 native10_prev.cpt
>> -rw-r--r-- 1 x_davva x_davva 0 Sep 5 23:18 native11.cpt
>> -rw-r--r-- 1 x_davva x_davva 0 Sep 5 23:18 native11_prev.cpt
>> and now it happened again, using gmx 4.5.1 (for consistency). It
>> seems like the checkpoint code is not REMD or multisim aware, and
>> hence the code to check for the existence of xxx_prev.cpt is not
>> It seems that this problem happens due to the fact that my jobs are
>> chained in the queueing system, and will restart a new job even if
>> the previous job crashed. Hence the problem might be prevented by
>> adding extensive checks in the script for existence of cpt files and
>> consistency of those.
>> Nevertheless it should be quite simple to introduce a multisim check
>> in the cpt code before the previous version is erased. Looking at the
>> latest (release-4-5-patches) source code this does not seem to be
More information about the gromacs.org_gmx-developers