[gmx-users] Why does the -append option exist?
Justin A. Lemkul
jalemkul at vt.edu
Wed Jun 8 05:11:42 CEST 2011
Dimitar Pachov wrote:
> Hello,
>
> Just a quick update after a few shorts tests we (my colleague and I)
> quickly did. First, using
>
> "/You can emulate this yourself by calling "sleep 10s" before mdrun and
> see if that's long enough to solve the latency issue in your case./"
>
> doesn't work for a few reasons, mainly because it doesn't seem to be a
> latency issue, but also because the load on a node is not affected by
> "sleep".
>
> However, you can reproduce the behavior I have observed pretty easily.
> It seems to be related to the values of the pointers to the *xtc, *trr,
> *edr, etc files written at the end of the checkpoint file after abrupt
> crashes AND to the frequency of access (opening) to those files. How to
> test:
>
> 1. In your input *mdp file put a high frequency of saving coordinates
> to, say, the *xtc (10, for example) and a low frequency for the *trr
> file (10,000, for example).
> 2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run)
> 3. Kill abruptly the run shortly after that (say, after 10-100 steps).
> 4. You should have a few frames written in the *xtc file, and the only
> one (the first) in the *trr file. The *cpt file should have different
> from zero values for "file_offset_low" for all of these files (the
> pointers have been updated).
>
> 5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run).
> 6. Kill abruptly the run shortly after that (say, after 10-100 steps).
> Pay attention that the frequency for accessing/writing the *trr has not
> been reached.
> 7. You should have a few additional frames written in the *xtc file,
> while the *trr will still have only 1 frame (the first). The *cpt file
> now has updated all pointer values "file_offset_low", BUT the pointer to
> the *trr has acquired a value of 0. Obviously, we already now what will
> happen if we restart again from this last *cpt file.
>
> 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run).
> 9. Kill it.
> 10. File *trr has size zero.
>
>
> Therefore, if a run is killed before the files are accessed for writing
> (depending on the chosen frequency), the file offset values reported in
> the *cpt file doesn't seem to be accordingly updated, and hence a new
> restart inevitably leads to overwritten output files.
>
> Do you think this is fixable?
>
Perhaps, but it will require some more details. I cannot reproduce this
problem, and I wonder if it is compiler- or platform-specific. Can you please
provide:
1. Compiler (and version) used to build Gromacs
2. Hardware details
3. Command used to configure Gromacs
-Justin
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
More information about the gromacs.org_gmx-users
mailing list