[gmx-users] floating point exception in .xtc file

Christopher Neale chris.neale at mail.utoronto.ca
Sat Jun 9 20:39:34 CEST 2012


Thank you Mark and Francesco,

I have repaired the trajectory and only lost 2 frames (40 ps total lost of 500 ns).

I used to automate a gmxcheck of each .xtc segment generated with mdrun -noappend and

rerun segments that were corrupted ... I may go back to that usage in the future.

For completeness, I believe that the filesystem that we are using is capable of locking files

because once and a while after a cluster crash I end up unable to restart simulations because

the .log file is "locked". In those cases, I do revert back to -noappend for continuations as

simply deleting the .log file also makes it impossible to continue the run.


Thank you,

Chris.


-- original message --

Hi Christopher,
you can try to use the program gmx_rescue, by Marc Baaden to recovery your
trajectory.

Below there is the adderess:
http://baaden.free.fr/soft/compchem.html

Francesco

2012/6/9 Mark Abraham <Mark.Abraham at anu.edu.au<http://lists.gromacs.org/mailman/listinfo/gmx-users>>

>  On 9/06/2012 7:27 AM, Christopher Neale wrote:
>
>  Dear Users:
>
>  I have a 500 ns trajectory of 65 GB that gives a floating point
> exception when I run it through gmxcheck or trjcat (generated and analyzed
> with gromacs 4.5.5). Has anybody encountered this? I ran mdrun with -append
> so this is the xtc resulting from months of simulation of a 1,000,000 atom
> system. If I run trjconv -f md.xtc -b 200000, where the floating point
> exception occurred around t=180000 ps in gmxcheck, then I can extract the
> readable frames and repair around the damaged section. Still, I'd rather
> not lose any data and I had thought that the new default -append option to
> mdrun checked for these types of problems at runtime.
>
>
> I've no idea what might happen when some file-system transient occurs
> mid-simulation, but if mdrun has managed to compute a checksum on an
> incomplete file and stored that in the checkpoint, then the append
> mechanism can be none the wiser. The check upon restart is that the
> checksum matches, not that the checksum is computed on a file whose
> properties would satisfy gmxcheck.
>
> Note also that some file systems that do not support file locking and this
> is known to cause issues (Redmine 924), but I don't know if this is related
> to your observation.
>
> Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120609/18a9b482/attachment.html>


More information about the gromacs.org_gmx-users mailing list