[gmx-developers] restarting from checkpoint--option to append to output files?

Daniel Larsson larsson at xray.bmc.uu.se
Tue Apr 22 22:16:30 CEST 2008


On Apr 22, 2008, at 22:01, Erik Lindahl wrote:
> Hi,
>
> On Apr 22, 2008, at 6:20 PM, Peter Kasson wrote:
>
>> It's great to have a checkpoint/resume feature in mdrun; one thing  
>> I notice is that the current code essentially treats a resume from  
>> checkpoint as a new run with an exact restart.  It would be nice  
>> to have an option to append to existing files so that one gets a  
>> single continuous trr, xtc, etc.  How complicated would this be?
>>
>> (One could imagine a relatively naive version that takes a -append  
>> flag and starts appending to output files if they exist when  
>> resuming from a checkpoint or a fancier version that stores hashes  
>> in the checkpoint file, verifies the hashes, and then appends only  
>> if the files check out as corresponding to the checkpoint).
>
> We discussed this a bit during the workshop - not sure if you were  
> there for that session.
>
> The big problem is that if somethings goes wrong (full disk,  
> crashed run, bad gromacs binary, whatever) you screw up your entire  
> trajectories and energy files, and then it's  a mess to fix things,  
> rather than just resubmitting with the correct settings.
>
> Another reason for not wanting it by default is that some parallel  
> file systems (some versions of GPFS?) simply don't support append  
> file operations.
>
> Still, I have to confess that I can't remember any good reasons for  
> not even having it as an option ;-)   I'll see if anybody else  
> voices any concerns here, otherwise we could add it as an optional  
> choice to mdrun (pretty nice in combination with the new max run  
> time option).
>
> Cheers,
>
> Erik
>


Another thing that also was mentioned during the workshop and that I  
would love to have is the possibility to feed the analysis tools with  
a set of trajectory and/or energy files from a continued simulation  
without having to concatenate them first.

The two main reasons would be that it is always good to keep the  
original files to minimize the risk of human errors and the other  
would be that long trajectories might result in very large files that  
are awkward to handle.

---

Daniel Larsson
Molecular Biophysics group
Department of Cell and Molecular Biology
Uppsala University

+46-18-471 4006 (phone)
+46-18-511 755  (fax)
http://xray.bmc.uu.se/~larsson
larsson at xray.bmc.uu.se






More information about the gromacs.org_gmx-developers mailing list