[gmx-users] Why does the -append option exist?

Mark Abraham Mark.Abraham at anu.edu.au
Sat Jun 4 03:24:16 CEST 2011


On 4/06/2011 8:26 AM, Dimitar Pachov wrote:
>
> At first, I thought the -append option of the mdrun command was great. 
> However, I don't think it is anymore and have actually  started 
> questioning myself why it exists at the first place, and second, why 
> has it become the default option in the newest versions?

It exists because it used to be a pain to manage your simulation file 
numbering.

> It is useless unless you run your simulations in a 100% safe from any 
> unexpected problems (hardware, restarts, etc) mode, which is never the 
> case. It is beyond me how such an option can become the default and 
> how a statement like this:
>
> "By default the output will be appending to the existing output files. 
> The checkpoint file contains checksums of all output files, such that 
> *you will never loose data when some output files are modified, 
> corrupt or removed.*"
>
> can be claimed without testing ALL of the scenarios that can lead to 
> problems, that is, lost data.

The checkpoint file records the position of the output file pointers at 
the time of the checkpoint, along with an MD5 checksum. Upon restarting 
with -append, mdrun seeks to that file pointer position, verifies the 
checksum and issues a fatal error if this is not possible. So if 
checkpoint and other files are not altered or removed after a crash, 
then the method seems pretty safe to me.

The above text mentions  you are safe even if you remove files - that's 
an overstatement. However, I can't see that removing a non-checkpoint 
file could lead to loss of useful data from other non-checkpoint files.

> If one uses that option and the run is restarted and is again 
> restarted before reaching the point of attempting to write a file, 
> then things are lost,

If this is true, then it wants fixing, and fast, and will get it :-) 
However, it would be surprising for such a problem to exist and not have 
been reported up to now. This feature has been in the code for a year 
now, and while some minor issues have been fixed since the 4.5 release, 
it would surprise me greatly if your claim was true.

You're saying the equivalent of the steps below can occur:
1. Simulation wanders along normally and writes a checkpoint at step 1003
2. Random crash happens at step 1106
3. An -append restart from the old .tpr and the recent .cpt file will 
restart from step 1003
4. Random crash happens at step 1059
5. Now a restart doesn't restart from step 1003, but some other step

> and most importantly, the most important piece of data, that being the 
> trajectory file, could be completely lost! I don't know the code 
> behind the checkpointing & appending, but I can see how easy one can 
> overwrite 100ns trajectories, for example, and "obtain" the same 
> trajectories of size .... 0.

I don't see how easy that is, without a concrete example, where user 
error is not possible.
> Using the checkpoint capability & appending make sense when many 
> restarts are expected, but unfortunately it is exactly then when these 
> options completely fail! As a new user of Gromacs, I must say I am 
> disappointed, and would like to obtain an explanation of why the usage 
> of these options is clearly stated to be safe when it is not, and why 
> the append option is the default, and why at least a single warning 
> has not been posted anywhere in the docs & manuals?

I can understand and sympathize with your frustration if you've 
experienced the loss of a simulation. Do be careful when suggesting that 
others' actions are blame-worthy, however. The developers all act in 
good faith on a largely volunteer basis. Errors in coding do happen, and 
they do get attention as developers' time permits. However, developers' 
time rarely permits addressing "feature X doesn't work, why not?" in a 
productive way. Solving bugs can be hard, but will be easier (and solved 
faster!) if the user who thinks a problem exists follows good procedure. 
See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html 
<http://www.chiark.greenend.org.uk/%7Esgtatham/bugs.html>

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20110604/ef68f53f/attachment.html>


More information about the gromacs.org_gmx-users mailing list