[gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

Mark Abraham mark.j.abraham at gmail.com
Thu Jun 16 13:24:11 CEST 2016


Hi,

On Thu, Jun 16, 2016 at 12:24 PM Husen R <hus3nr at gmail.com> wrote:

> On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> > Hi,
> >
> > There's just nothing special about any node at run time.
> >
> > Your script looks like it is building GROMACS fresh each time - there's
> no
> > need to do that,
>
>
> which part of my script ?
>

I can't tell how your script is finding its GROMACS installations, but the
advisory message says precisely that your runs are finding different
installations...

  Build time mismatch,
    current program: Sel Apr  5 13:37:32 WIB 2016
    checkpoint file: Rab Apr  6 09:44:51 WIB 2016

  Build user mismatch,
    current program: pro at head-node [CMAKE]
    checkpoint file: pro at compute-node [CMAKE]

This reinforces my impression that the view of your file system available
at the start of the job script is varying with your choice of node subsets.


> I always use this command to restart from checkpoint file -->  "mpirun
> gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
> as far as I know -cpi option is used to refer to checkpoint file as input
> file.
>  what I have to change in my script ?
>

Nothing about that aspect. But clearly your first run and the restart
simulating loss of a node are finding different gmx_mpi binaries from their
respective environments. This is not itself a problem, but it's probably
not what you intend, and may be symptomatic of the same issue that leads to
md_test.xtc not being accessible.

Mark


>
> but the fact that the node name is showing up in the check
> > that takes place when the checkpoint is read is not relevant to the
> > problem.
> >
> > Mark
> >
> > On Thu, Jun 16, 2016 at 9:46 AM Husen R <hus3nr at gmail.com> wrote:
> >
> > > On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham <
> mark.j.abraham at gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > On Thu, Jun 16, 2016 at 9:30 AM Husen R <hus3nr at gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thank you for your reply !
> > > > >
> > > > > md_test.xtc is exist and writable.
> > > > >
> > > >
> > > > OK, but it needs to be seen that way from the set of compute nodes
> you
> > > are
> > > > using, and organizing that is up to you and your job scheduler, etc.
> > > >
> > > >
> > > > > I tried to restart from checkpoint file by excluding other node
> than
> > > > > compute-node and it works.
> > > > >
> > > >
> > > > Go do that, then :-)
> > > >
> > >
> > > I'm building a simple system that can respond to node failure. if
> failure
> > > occured on node A, than the application has to be restarted and that
> node
> > > has to be excluded.
> > > this should apply to all node including this 'compute-node'.
> > >
> > > >
> > > >
> > > > > only '--exclude=compute-node' that produces this error.
> > > > >
> > > >
> > > > Then there's something about that node that is special with respect
> to
> > > the
> > > > file system - there's nothing about any particular node that GROMACS
> > > cares
> > > > about.
> > > >
> > >
> > > > Mark
> > > >
> > > >
> > > > > is this has the same issue with this thread ?
> > > > > http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
> > > > >
> > > > > regards,
> > > > >
> > > > > Husen
> > > > >
> > > > > On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham <
> > > mark.j.abraham at gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The stuff about different nodes or numbers of nodes doesn't
> matter
> > -
> > > > it's
> > > > > > merely an advisory note from mdrun. mdrun failed when it tried to
> > > > operate
> > > > > > upon md_test.xtc, so perhaps you need to consider whether the
> file
> > > > > exists,
> > > > > > is writable, etc.
> > > > > >
> > > > > > Mark
> > > > > >
> > > > > > On Thu, Jun 16, 2016 at 6:48 AM Husen R <hus3nr at gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I got the following error message when I tried to restart
> gromacs
> > > > > > > simulation from checkpoint file.
> > > > > > > I restart the simulation using fewer nodes and processes, and
> > also
> > > I
> > > > > > > exclude one node using '--exclude=' option (in slurm) for
> > > > experimental
> > > > > > > purpose.
> > > > > > >
> > > > > > > I'm sure fewer nodes and processes are not the cause of this
> > error
> > > > as I
> > > > > > > already test that.
> > > > > > > I have checked that the cause of this error is '--exclude='
> > usage.
> > > I
> > > > > > > excluded 1 node named 'compute-node' when restart from
> checkpoint
> > > (at
> > > > > > first
> > > > > > > run, I use all node including 'compute-node').
> > > > > > >
> > > > > > >
> > > > > > > it seems that at first run, the submit job script was built at
> > > > > > > compute-node. So, at restart, build user mismatch appeared
> > because
> > > > > > > compute-node was not found (excluded).
> > > > > > >
> > > > > > > Am I right ? is this behavior normal ?
> > > > > > > or is that a way to avoid this, so I can freely restart from
> > > > checkpoint
> > > > > > > using any nodes without limitation.
> > > > > > >
> > > > > > > thank you in advance
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > >
> > > > > > > Husen
> > > > > > >
> > > > > > > ==========================restart script=================
> > > > > > > #!/bin/bash
> > > > > > > #SBATCH -J ayo
> > > > > > > #SBATCH -o md%j.out
> > > > > > > #SBATCH -A necis
> > > > > > > #SBATCH -N 2
> > > > > > > #SBATCH -n 16
> > > > > > > #SBATCH --exclude=compute-node
> > > > > > > #SBATCH --time=144:00:00
> > > > > > > #SBATCH --mail-user=hus3nr at gmail.com
> > > > > > > #SBATCH --mail-type=begin
> > > > > > > #SBATCH --mail-type=end
> > > > > > >
> > > > > > > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > > > > > > =====================================================
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ==================================output
> > > > error========================
> > > > > > > Reading checkpoint file md_test.cpt generated: Wed Jun 15
> > 16:30:44
> > > > 2016
> > > > > > >
> > > > > > >
> > > > > > >   Build time mismatch,
> > > > > > >     current program: Sel Apr  5 13:37:32 WIB 2016
> > > > > > >     checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > > > > > >
> > > > > > >   Build user mismatch,
> > > > > > >     current program: pro at head-node [CMAKE]
> > > > > > >     checkpoint file: pro at compute-node [CMAKE]
> > > > > > >
> > > > > > >   #ranks mismatch,
> > > > > > >     current program: 16
> > > > > > >     checkpoint file: 24
> > > > > > >
> > > > > > >   #PME-ranks mismatch,
> > > > > > >     current program: -1
> > > > > > >     checkpoint file: 6
> > > > > > >
> > > > > > > GROMACS patchlevel, binary or parallel settings differ from
> > > previous
> > > > > run.
> > > > > > > Continuation is exact, but not guaranteed to be binary
> identical.
> > > > > > >
> > > > > > >
> > > > > > > -------------------------------------------------------
> > > > > > > Program gmx mdrun, VERSION 5.1.2
> > > > > > > Source code file:
> > > > > > > /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
> line:
> > > 2216
> > > > > > >
> > > > > > > Fatal error:
> > > > > > > Truncation of file md_test.xtc failed. Cannot do appending
> > because
> > > of
> > > > > > this
> > > > > > > failure.
> > > > > > > For more information and tips for troubleshooting, please check
> > the
> > > > > > GROMACS
> > > > > > > website at http://www.gromacs.org/Documentation/Errors
> > > > > > > -------------------------------------------------------
> > > > > > >
> ================================================================
> > > > > > > --
> > > > > > > Gromacs Users mailing list
> > > > > > >
> > > > > > > * Please search the archive at
> > > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> > before
> > > > > > > posting!
> > > > > > >
> > > > > > > * Can't post? Read
> http://www.gromacs.org/Support/Mailing_Lists
> > > > > > >
> > > > > > > * For (un)subscribe requests visit
> > > > > > >
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > > or
> > > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > >
> > > > > > --
> > > > > > Gromacs Users mailing list
> > > > > >
> > > > > > * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> before
> > > > > > posting!
> > > > > >
> > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > >
> > > > > > * For (un)subscribe requests visit
> > > > > >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > or
> > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > >
> > > > > --
> > > > > Gromacs Users mailing list
> > > > >
> > > > > * Please search the archive at
> > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > posting!
> > > > >
> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > >
> > > > > * For (un)subscribe requests visit
> > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > > > send a mail to gmx-users-request at gromacs.org.
> > > > >
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list