[gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

Husen R hus3nr at gmail.com
Thu Jun 23 11:00:09 CEST 2016


Hi,

I'm wondering, if I use gromacs in cluster environment, do I have to
install gromacs in every nodes (at /usr/local/gromacs in every nodes) ?
or is it enough to install gromacs in one node (example,in head-node) only
?

Regards,

Husen



On Thu, Jun 23, 2016 at 3:41 PM, Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> Hi,
>
> The only explanation is that that file is not in fact properly accessible
> if rank 0 is placed other than on "compute-node," which means your
> organization of file system / slurm / etc. aren't good enough for what
> you're doing.
>
> Mark
>
> On Thu, Jun 23, 2016 at 10:15 AM Husen R <hus3nr at gmail.com> wrote:
>
> > Hi,
> >
> > I still unable to find out the cause of the fatal error.
> > Previously, gromacs is installed in every nodes. That's the cause Build
> > time mismatch and Build user mismatch appeared.
> > Now, Build time mismatch and Build user mismatch issues are solved by
> > installing Gromacs in shared directory.
> >
> > I have tried to install gromacs in one node only (not in shared
> directory),
> > but the error appeared.
> >
> >
> > this is the error message if I exclude compute-node
> > "--exclude=compute-node" from nodelist in slurm sbatch. excluding other
> > nodes works fine.
> >
> >
> >
> >
> =========================================================================================
> > GROMACS:      gmx mdrun, VERSION 5.1.2
> > Executable:   /mirror/source/gromacs/bin/gmx_mpi
> > Data prefix:  /mirror/source/gromacs
> > Command line:
> >   gmx_mpi mdrun -cpi md_gmx.cpt -deffnm md_gmx
> >
> >
> > Running on 2 nodes with total 8 cores, 16 logical cores
> >   Cores per node:            4
> >   Logical cores per node:    8
> > Hardware detected on host head-node (the node of MPI rank 0):
> >   CPU info:
> >     Vendor: GenuineIntel
> >     Brand:  Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
> >     SIMD instructions most likely to fit this hardware: AVX_256
> >     SIMD instructions selected at GROMACS compile time: AVX_256
> >
> > Reading file md_gmx.tpr, VERSION 5.1.2 (single precision)
> > Changing nstlist from 10 to 20, rlist from 1 to 1.03
> >
> >
> > Reading checkpoint file md_gmx.cpt generated: Thu Jun 23 12:54:02 2016
> >
> >
> >   #ranks mismatch,
> >     current program: 16
> >     checkpoint file: 24
> >
> >   #PME-ranks mismatch,
> >     current program: -1
> >     checkpoint file: 6
> >
> > GROMACS patchlevel, binary or parallel settings differ from previous run.
> > Continuation is exact, but not guaranteed to be binary identical.
> >
> >
> > -------------------------------------------------------
> > Program gmx mdrun, VERSION 5.1.2
> > Source code file:
> >
> /home/necis/gromacsinstall/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
> > line: 2216
> >
> > Fatal error:
> > Truncation of file md_gmx.xtc failed. Cannot do appending because of this
> > failure.
> > For more information and tips for troubleshooting, please check the
> GROMACS
> > website at http://www.gromacs.org/Documentation/Errors
> >
> >
> ============================================================================================================
> >
> > On Thu, Jun 16, 2016 at 6:23 PM, Mark Abraham <mark.j.abraham at gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > On Thu, Jun 16, 2016 at 12:24 PM Husen R <hus3nr at gmail.com> wrote:
> > >
> > > > On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham <
> > mark.j.abraham at gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > There's just nothing special about any node at run time.
> > > > >
> > > > > Your script looks like it is building GROMACS fresh each time -
> > there's
> > > > no
> > > > > need to do that,
> > > >
> > > >
> > > > which part of my script ?
> > > >
> > >
> > > I can't tell how your script is finding its GROMACS installations, but
> > the
> > > advisory message says precisely that your runs are finding different
> > > installations...
> > >
> > >   Build time mismatch,
> > >     current program: Sel Apr  5 13:37:32 WIB 2016
> > >     checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > >
> > >   Build user mismatch,
> > >     current program: pro at head-node [CMAKE]
> > >     checkpoint file: pro at compute-node [CMAKE]
> > >
> > > This reinforces my impression that the view of your file system
> available
> > > at the start of the job script is varying with your choice of node
> > subsets.
> > >
> > >
> > > > I always use this command to restart from checkpoint file -->
> "mpirun
> > > > gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
> > > > as far as I know -cpi option is used to refer to checkpoint file as
> > input
> > > > file.
> > > >  what I have to change in my script ?
> > > >
> > >
> > > Nothing about that aspect. But clearly your first run and the restart
> > > simulating loss of a node are finding different gmx_mpi binaries from
> > their
> > > respective environments. This is not itself a problem, but it's
> probably
> > > not what you intend, and may be symptomatic of the same issue that
> leads
> > to
> > > md_test.xtc not being accessible.
> > >
> > > Mark
> > >
> > >
> > > >
> > > > but the fact that the node name is showing up in the check
> > > > > that takes place when the checkpoint is read is not relevant to the
> > > > > problem.
> > > > >
> > > > > Mark
> > > > >
> > > > > On Thu, Jun 16, 2016 at 9:46 AM Husen R <hus3nr at gmail.com> wrote:
> > > > >
> > > > > > On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham <
> > > > mark.j.abraham at gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Thu, Jun 16, 2016 at 9:30 AM Husen R <hus3nr at gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Thank you for your reply !
> > > > > > > >
> > > > > > > > md_test.xtc is exist and writable.
> > > > > > > >
> > > > > > >
> > > > > > > OK, but it needs to be seen that way from the set of compute
> > nodes
> > > > you
> > > > > > are
> > > > > > > using, and organizing that is up to you and your job scheduler,
> > > etc.
> > > > > > >
> > > > > > >
> > > > > > > > I tried to restart from checkpoint file by excluding other
> node
> > > > than
> > > > > > > > compute-node and it works.
> > > > > > > >
> > > > > > >
> > > > > > > Go do that, then :-)
> > > > > > >
> > > > > >
> > > > > > I'm building a simple system that can respond to node failure. if
> > > > failure
> > > > > > occured on node A, than the application has to be restarted and
> > that
> > > > node
> > > > > > has to be excluded.
> > > > > > this should apply to all node including this 'compute-node'.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > only '--exclude=compute-node' that produces this error.
> > > > > > > >
> > > > > > >
> > > > > > > Then there's something about that node that is special with
> > respect
> > > > to
> > > > > > the
> > > > > > > file system - there's nothing about any particular node that
> > > GROMACS
> > > > > > cares
> > > > > > > about.
> > > > > > >
> > > > > >
> > > > > > > Mark
> > > > > > >
> > > > > > >
> > > > > > > > is this has the same issue with this thread ?
> > > > > > > >
> > > http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
> > > > > > > >
> > > > > > > > regards,
> > > > > > > >
> > > > > > > > Husen
> > > > > > > >
> > > > > > > > On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham <
> > > > > > mark.j.abraham at gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > The stuff about different nodes or numbers of nodes doesn't
> > > > matter
> > > > > -
> > > > > > > it's
> > > > > > > > > merely an advisory note from mdrun. mdrun failed when it
> > tried
> > > to
> > > > > > > operate
> > > > > > > > > upon md_test.xtc, so perhaps you need to consider whether
> the
> > > > file
> > > > > > > > exists,
> > > > > > > > > is writable, etc.
> > > > > > > > >
> > > > > > > > > Mark
> > > > > > > > >
> > > > > > > > > On Thu, Jun 16, 2016 at 6:48 AM Husen R <hus3nr at gmail.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I got the following error message when I tried to restart
> > > > gromacs
> > > > > > > > > > simulation from checkpoint file.
> > > > > > > > > > I restart the simulation using fewer nodes and processes,
> > and
> > > > > also
> > > > > > I
> > > > > > > > > > exclude one node using '--exclude=' option (in slurm) for
> > > > > > > experimental
> > > > > > > > > > purpose.
> > > > > > > > > >
> > > > > > > > > > I'm sure fewer nodes and processes are not the cause of
> > this
> > > > > error
> > > > > > > as I
> > > > > > > > > > already test that.
> > > > > > > > > > I have checked that the cause of this error is
> '--exclude='
> > > > > usage.
> > > > > > I
> > > > > > > > > > excluded 1 node named 'compute-node' when restart from
> > > > checkpoint
> > > > > > (at
> > > > > > > > > first
> > > > > > > > > > run, I use all node including 'compute-node').
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > it seems that at first run, the submit job script was
> built
> > > at
> > > > > > > > > > compute-node. So, at restart, build user mismatch
> appeared
> > > > > because
> > > > > > > > > > compute-node was not found (excluded).
> > > > > > > > > >
> > > > > > > > > > Am I right ? is this behavior normal ?
> > > > > > > > > > or is that a way to avoid this, so I can freely restart
> > from
> > > > > > > checkpoint
> > > > > > > > > > using any nodes without limitation.
> > > > > > > > > >
> > > > > > > > > > thank you in advance
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Husen
> > > > > > > > > >
> > > > > > > > > > ==========================restart script=================
> > > > > > > > > > #!/bin/bash
> > > > > > > > > > #SBATCH -J ayo
> > > > > > > > > > #SBATCH -o md%j.out
> > > > > > > > > > #SBATCH -A necis
> > > > > > > > > > #SBATCH -N 2
> > > > > > > > > > #SBATCH -n 16
> > > > > > > > > > #SBATCH --exclude=compute-node
> > > > > > > > > > #SBATCH --time=144:00:00
> > > > > > > > > > #SBATCH --mail-user=hus3nr at gmail.com
> > > > > > > > > > #SBATCH --mail-type=begin
> > > > > > > > > > #SBATCH --mail-type=end
> > > > > > > > > >
> > > > > > > > > > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > > > > > > > > > =====================================================
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ==================================output
> > > > > > > error========================
> > > > > > > > > > Reading checkpoint file md_test.cpt generated: Wed Jun 15
> > > > > 16:30:44
> > > > > > > 2016
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >   Build time mismatch,
> > > > > > > > > >     current program: Sel Apr  5 13:37:32 WIB 2016
> > > > > > > > > >     checkpoint file: Rab Apr  6 09:44:51 WIB 2016
> > > > > > > > > >
> > > > > > > > > >   Build user mismatch,
> > > > > > > > > >     current program: pro at head-node [CMAKE]
> > > > > > > > > >     checkpoint file: pro at compute-node [CMAKE]
> > > > > > > > > >
> > > > > > > > > >   #ranks mismatch,
> > > > > > > > > >     current program: 16
> > > > > > > > > >     checkpoint file: 24
> > > > > > > > > >
> > > > > > > > > >   #PME-ranks mismatch,
> > > > > > > > > >     current program: -1
> > > > > > > > > >     checkpoint file: 6
> > > > > > > > > >
> > > > > > > > > > GROMACS patchlevel, binary or parallel settings differ
> from
> > > > > > previous
> > > > > > > > run.
> > > > > > > > > > Continuation is exact, but not guaranteed to be binary
> > > > identical.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -------------------------------------------------------
> > > > > > > > > > Program gmx mdrun, VERSION 5.1.2
> > > > > > > > > > Source code file:
> > > > > > > > > >
> /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
> > > > line:
> > > > > > 2216
> > > > > > > > > >
> > > > > > > > > > Fatal error:
> > > > > > > > > > Truncation of file md_test.xtc failed. Cannot do
> appending
> > > > > because
> > > > > > of
> > > > > > > > > this
> > > > > > > > > > failure.
> > > > > > > > > > For more information and tips for troubleshooting, please
> > > check
> > > > > the
> > > > > > > > > GROMACS
> > > > > > > > > > website at http://www.gromacs.org/Documentation/Errors
> > > > > > > > > > -------------------------------------------------------
> > > > > > > > > >
> > > > ================================================================
> > > > > > > > > > --
> > > > > > > > > > Gromacs Users mailing list
> > > > > > > > > >
> > > > > > > > > > * Please search the archive at
> > > > > > > > > >
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> > > > > before
> > > > > > > > > > posting!
> > > > > > > > > >
> > > > > > > > > > * Can't post? Read
> > > > http://www.gromacs.org/Support/Mailing_Lists
> > > > > > > > > >
> > > > > > > > > > * For (un)subscribe requests visit
> > > > > > > > > >
> > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > > > > > or
> > > > > > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Gromacs Users mailing list
> > > > > > > > >
> > > > > > > > > * Please search the archive at
> > > > > > > > >
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> > > > before
> > > > > > > > > posting!
> > > > > > > > >
> > > > > > > > > * Can't post? Read
> > > http://www.gromacs.org/Support/Mailing_Lists
> > > > > > > > >
> > > > > > > > > * For (un)subscribe requests visit
> > > > > > > > >
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > > > > or
> > > > > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > > > >
> > > > > > > > --
> > > > > > > > Gromacs Users mailing list
> > > > > > > >
> > > > > > > > * Please search the archive at
> > > > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> > > before
> > > > > > > > posting!
> > > > > > > >
> > > > > > > > * Can't post? Read
> > http://www.gromacs.org/Support/Mailing_Lists
> > > > > > > >
> > > > > > > > * For (un)subscribe requests visit
> > > > > > > >
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > > > or
> > > > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > > >
> > > > > > > --
> > > > > > > Gromacs Users mailing list
> > > > > > >
> > > > > > > * Please search the archive at
> > > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> > before
> > > > > > > posting!
> > > > > > >
> > > > > > > * Can't post? Read
> http://www.gromacs.org/Support/Mailing_Lists
> > > > > > >
> > > > > > > * For (un)subscribe requests visit
> > > > > > >
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > > or
> > > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > >
> > > > > > --
> > > > > > Gromacs Users mailing list
> > > > > >
> > > > > > * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> before
> > > > > > posting!
> > > > > >
> > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > >
> > > > > > * For (un)subscribe requests visit
> > > > > >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > or
> > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > >
> > > > > --
> > > > > Gromacs Users mailing list
> > > > >
> > > > > * Please search the archive at
> > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > posting!
> > > > >
> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > >
> > > > > * For (un)subscribe requests visit
> > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > > > send a mail to gmx-users-request at gromacs.org.
> > > > >
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list