[gmx-users] chiller failure leads to truncated .cpt and _prev.cpt files using gromacs 4.6.1

Wed Mar 27 13:33:46 CET 2013

Hi,

Gromacs calls fsync for every checkpoint file written:

       fsync() transfers ("flushes") all modified in-core data of (i.e., modi-
       fied  buffer cache pages for) the file referred to by the file descrip-
       tor fd to the disk device (or other permanent storage device)  so  that
       all  changed information can be retrieved even after the system crashed
       or was rebooted.  This includes writing  through  or  flushing  a  disk
       cache  if  present.   The call blocks until the device reports that the
       transfer has completed.  It also flushes metadata  information  associ-
       ated with the file (see stat(2)).

If fsync fails, mdrun exits with a fatal error.
We have experience with unreliable AFS file systems, where fsync mdrun could wait for hours and fail,
for which we added an environment variable.
So either fsync is not supported on your system (highly unlikely)
or your file system returns 0, indicating the file was synched, but it actually didn't fully sync.

Note that we first write a new checkpoint file with number, fynsc that, then move the current
to _prev (thereby loosing the old prev) and then the numbered one to the current.
So you should never end up with only corrupted files, unless fsync doesn't do what it's supposed to do.

Cheers,

Berk

----------------------------------------
> From: chris.neale at mail.utoronto.ca
> To: gmx-users at gromacs.org
> Date: Wed, 27 Mar 2013 03:13:57 +0000
> Subject: [gmx-users] chiller failure leads to truncated .cpt and _prev.cpt files using gromacs 4.6.1
>
> Dear Matthew:
>
> Thank you for noticing the file size. This is a very good lead.
> I had not noticed that this was special. Indeed, here is the complete listing for truncated/corrupt .cpt files:
>
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:53 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:54 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:54 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:54 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:50 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:50 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:50 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:51 md3.cpt
> -rw-r----- 1 cneale cneale 1048576 Mar 26 18:51 md3.cpt
> -rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
> -rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
> -rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
> -rw-r----- 1 cneale cneale 2097152 Mar 26 18:52 md3.cpt
>
> I will contact my sysadmins and let them know about your suggestions.
>
> Nevertheless, I respectfully reject the idea that there is really nothing that can be done about this inside
> gromacs. About 6 years ago, I worked on a cluster with massive sporadic NSF delays. The only solution to
> automate runs on that machine was to, for example, use sed to create a .mdp from a template .mdp file, which had ;;;EOF as the last line and then to poll the created mdp file for ;;;EOF until it existed prior to running
> grompp (at the time I was using mdrun -sort and desorting with an in-house script prior to domain
> decomposition, so I had to stop/start gromacs every coupld of hours). This is not to say that such things are
> ideal, but I think gromacs would be all the better if it was able to avoid with problems like this regardless of
> the cluster setup.
>
> Please note that, over the years, I have seen this on 4 different clusters (albeit with different versions of
> gromacs), but that is to say that it's not just one setup that is to blame.
>
> Matthew, please don't take my comments the wrong way. I deeply appreciate your help. I just want to put it
> out there that I believe that gromacs would be better if it didn't overwrite good .cpt files with truncated/corrupt
> .cpt files ever, even if the cluster catches on fire or the earth's magnetic field reverses, etc.
> Also, I suspect that sysadmins don't have a lot of time to test their clusters for graceful exit upon chiller failure
> conditions, so a super-careful regime of .cpt update will always be useful.
>
> Thank you again for your help, I'll take it to my sysadmins, who are very good and may be able to remedy
> this on their cluster, but who knows what cluster I will be using in 5 years.
>
> Again, thank you for your assistance, it is very useful,
> Chris.
>
> -- original message --
>
>
> Dear Chris,
>
> While it's always possible that GROMACS can be improved (or debugged), this
> smells more like a system-level problem. The corrupt checkpoint files are
> precisely 1MiB or 2MiB, which suggests strongly either 1) GROMACS was in
> the middle of a buffer flush when it was killed (but the filesystem did
> everything right; it was just sent incomplete data), or 2) the filesystem
> itself wrote a truncated file (but GROMACS wrote it successfully, the data
> was buffered, and GROMACS went on its merry way).
>
> #1 could happen, for example, if GROMACS was killed with SIGKILL while
> copying .cpt to _prev.cpt -- if GROMACS even copies, rather than renames --
> its checkpoint files. #2 could happen in any number of ways, depending on
> precisely how your disks, filesystems, and network filesystems are all
> configured (for example, if a RAID array goes down hard with per-drive
> writeback caches enabled, or your NFS system is soft-mounted and either
> client or server goes down). With the sizes of the truncated checkpoint
> files being very convenient numbers, my money is on #2.
>
> Have you contacted your sysadmins to report this? They may be able to take
> some steps to try to prevent this, and (if this is indeed a system problem)
> doing so would provide all their users an increased measure of safety for
> their data.
>
> Cheers,
> MZ
>
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists