[gmx-developers] Checkpoints & REMD
Berk Hess
hess at cbr.su.se
Wed Sep 7 09:56:51 CEST 2011
Hi,
Could you try the fix below.
Berk
diff --git a/src/kernel/mdrun.c b/src/kernel/mdrun.c
index 8878331..7b1b396 100644
--- a/src/kernel/mdrun.c
+++ b/src/kernel/mdrun.c
@@ -595,6 +595,11 @@ int main(int argc,char *argv[])
{
sim_part = sim_part_fn + 1;
}
+
+ if (MULTISIM(cr))
+ {
+ check_multi_int(stdout,cr->ms,sim_part,"simulation part");
+ }
}
else
{
On 09/07/2011 09:48 AM, Berk Hess wrote:
> Hi,
>
> The 0 size files are a general checkpointing, or better: file append
> mode opening, bug,
> which has been fixed for 4.5.5. There was another fix in an
> intermediate version,
> but in the current release-4-5-patches it should be completely fixed.
>
> Or are you referring to the problem that mdrun reads checkpoints for
> some, but not all
> replicas and does not realize this?
> That should indeed be fixed.
>
> Berk
>
> On 09/07/2011 09:29 AM, David van der Spoel wrote:
>> Hi,
>>
>> I have been bitten by this problem before:
>>
>> [neolith1:native/REMD] % ls -l *cpt
>> -rw-r--r-- 1 x_davva x_davva 635388 Sep 5 23:18 native10.cpt
>> -rw-r--r-- 1 x_davva x_davva 635388 Sep 5 23:18 native10_prev.cpt
>> -rw-r--r-- 1 x_davva x_davva 0 Sep 5 23:18 native11.cpt
>> -rw-r--r-- 1 x_davva x_davva 0 Sep 5 23:18 native11_prev.cpt
>>
>> and now it happened again, using gmx 4.5.1 (for consistency). It
>> seems like the checkpoint code is not REMD or multisim aware, and
>> hence the code to check for the existence of xxx_prev.cpt is not
>> sufficient.
>>
>> It seems that this problem happens due to the fact that my jobs are
>> chained in the queueing system, and will restart a new job even if
>> the previous job crashed. Hence the problem might be prevented by
>> adding extensive checks in the script for existence of cpt files and
>> consistency of those.
>>
>> Nevertheless it should be quite simple to introduce a multisim check
>> in the cpt code before the previous version is erased. Looking at the
>> latest (release-4-5-patches) source code this does not seem to be
>> present.
>>
>> Cheers,
>
More information about the gromacs.org_gmx-developers
mailing list