[gmx-users] Restarting a simulation when checkpoint files are corrupted
David Dotson
dldotson at asu.edu
Fri Mar 17 20:12:32 CET 2017
Excellent! Thanks Mark! Yeah, already had a discussion with the sysadmin of this particular system; it was definitely not something that should happen.
Cheers!
David
On 03/17/2017 10:11 AM, Mark Abraham wrote:
> Hi,
>
>
> On Fri, Mar 17, 2017 at 6:00 PM David Dotson <dldotson at asu.edu> wrote:
>
>> Greetings,
>>
>> I have a simulation that has been running for a long time, with many
>> trajectory segments (counting up to about 190). One of the segments ran on
>> a cluster that experienced a filesystem outage such that some of the files
>> for that run were corrupted, including its checkpoint files (both the *.cpt
>> and *_prev.cpt, somehow).
>
> Yeah, that's a recurring problem. mdrun tells the filesystem to flush to
> disk, but if it doesn't actually do that, then all we can suggest is that
> you request more conservative settings from your sysadmins.
>
>
>> Since I only have checkpoint files for the most recent run, I need to do
>> something like a manual restart.
>>
>> I'm aware I can do this with `gmx convert-tpr` using an existing TPR and
>> the most recent uncorrupted TRR and EDR file, but starting a run from this
>> restarts counting for the output files from `0001`. Is there some way to
>> make counting start from another value (like `0191`)?
>>
> Somehow it never got documented (I will fix that), but there is an .mdp
> field that lets you set "simulation-part = 191" that would let you take
> matters suitably into your own hands.
>
> Mark
>
>
>> If not, I can work around this I think by concatenating the existing parts
>> and moving forward with restarted counting, but being able to do this would
>> be the cleanest solution for me in this particular case to maintain some
>> consistency with other runs.
>>
>> Thanks!
>>
>> David
>>
>> --
>> David L. Dotson * david.dotson at asu.edu
>> Beckstein Lab
>> Center for Biological Physics
>> Arizona State University
>>
>> becksteinlab.physics.asu.edu
>>
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
--
David L. Dotson * david.dotson at asu.edu
Beckstein Lab
Center for Biological Physics
Arizona State University
becksteinlab.physics.asu.edu
More information about the gromacs.org_gmx-users
mailing list