[gmx-users] Why does the -append option exist?

Justin A. Lemkul jalemkul at vt.edu
Wed Jun 8 05:11:42 CEST 2011



Dimitar Pachov wrote:
> Hello,
> 
> Just a quick update after a few shorts tests we (my colleague and I) 
> quickly did. First, using 
> 
> "/You can emulate this yourself by calling "sleep 10s" before mdrun and 
> see if that's long enough to solve the latency issue in your case./"
> 
> doesn't work for a few reasons, mainly because it doesn't seem to be a 
> latency issue, but also because the load on a node is not affected by 
> "sleep".
> 
> However, you can reproduce the behavior I have observed pretty easily. 
> It seems to be related to the values of the pointers to the *xtc, *trr, 
> *edr, etc files written at the end of the checkpoint file after abrupt 
> crashes AND to the frequency of access (opening) to those files. How to 
> test:
>  
> 1. In your input *mdp file put a high frequency of saving coordinates 
> to, say, the *xtc (10, for example) and a low frequency for the *trr 
> file (10,000, for example).
> 2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run)
> 3. Kill abruptly the run shortly after that (say, after 10-100 steps).
> 4. You should have a few frames written in the *xtc file, and the only 
> one (the first) in the *trr file. The *cpt file should have different 
> from zero values for "file_offset_low" for all of these files (the 
> pointers have been updated).
> 
> 5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 
> 6. Kill abruptly the run shortly after that (say, after 10-100 steps). 
> Pay attention that the frequency for accessing/writing the *trr has not 
> been reached. 
> 7. You should have a few additional frames written in the *xtc file, 
> while the *trr will still have only 1 frame (the first). The *cpt file 
> now has updated all pointer values "file_offset_low", BUT the pointer to 
> the *trr has acquired a value of 0. Obviously, we already now what will 
> happen if we restart again from this last *cpt file. 
> 
> 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 
> 9. Kill it. 
> 10. File *trr has size zero. 
> 
> 
> Therefore, if a run is killed before the files are accessed for writing 
> (depending on the chosen frequency), the file offset values reported in 
> the *cpt file doesn't seem to be accordingly updated, and hence a new 
> restart inevitably leads to overwritten output files.
>  
> Do you think this is fixable?
> 

Perhaps, but it will require some more details.  I cannot reproduce this 
problem, and I wonder if it is compiler- or platform-specific.  Can you please 
provide:

1. Compiler (and version) used to build Gromacs
2. Hardware details
3. Command used to configure Gromacs

-Justin

-- 
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================



More information about the gromacs.org_gmx-users mailing list