[gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed
Husen R
hus3nr at gmail.com
Thu Jun 23 10:13:59 CEST 2016
Hi,
I still unable to find out the cause of the fatal error.
Previously, gromacs is installed in every nodes. That's the cause Build
time mismatch and Build user mismatch appeared.
Now, Build time mismatch and Build user mismatch issues are solved by
installing Gromacs in shared directory.
I have tried to install gromacs in one node only (not in shared directory),
but the error appeared.
this is the error message if I exclude compute-node
"--exclude=compute-node" from nodelist in slurm sbatch. excluding other
nodes works fine.
=========================================================================================
GROMACS: gmx mdrun, VERSION 5.1.2
Executable: /mirror/source/gromacs/bin/gmx_mpi
Data prefix: /mirror/source/gromacs
Command line:
gmx_mpi mdrun -cpi md_gmx.cpt -deffnm md_gmx
Running on 2 nodes with total 8 cores, 16 logical cores
Cores per node: 4
Logical cores per node: 8
Hardware detected on host head-node (the node of MPI rank 0):
CPU info:
Vendor: GenuineIntel
Brand: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
SIMD instructions most likely to fit this hardware: AVX_256
SIMD instructions selected at GROMACS compile time: AVX_256
Reading file md_gmx.tpr, VERSION 5.1.2 (single precision)
Changing nstlist from 10 to 20, rlist from 1 to 1.03
Reading checkpoint file md_gmx.cpt generated: Thu Jun 23 12:54:02 2016
#ranks mismatch,
current program: 16
checkpoint file: 24
#PME-ranks mismatch,
current program: -1
checkpoint file: 6
GROMACS patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.
-------------------------------------------------------
Program gmx mdrun, VERSION 5.1.2
Source code file:
/home/necis/gromacsinstall/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
line: 2216
Fatal error:
Truncation of file md_gmx.xtc failed. Cannot do appending because of this
failure.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
============================================================================================================
On Thu, Jun 16, 2016 at 6:23 PM, Mark Abraham <mark.j.abraham at gmail.com>
wrote:
> Hi,
>
> On Thu, Jun 16, 2016 at 12:24 PM Husen R <hus3nr at gmail.com> wrote:
>
> > On Thu, Jun 16, 2016 at 4:01 PM, Mark Abraham <mark.j.abraham at gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > There's just nothing special about any node at run time.
> > >
> > > Your script looks like it is building GROMACS fresh each time - there's
> > no
> > > need to do that,
> >
> >
> > which part of my script ?
> >
>
> I can't tell how your script is finding its GROMACS installations, but the
> advisory message says precisely that your runs are finding different
> installations...
>
> Build time mismatch,
> current program: Sel Apr 5 13:37:32 WIB 2016
> checkpoint file: Rab Apr 6 09:44:51 WIB 2016
>
> Build user mismatch,
> current program: pro at head-node [CMAKE]
> checkpoint file: pro at compute-node [CMAKE]
>
> This reinforces my impression that the view of your file system available
> at the start of the job script is varying with your choice of node subsets.
>
>
> > I always use this command to restart from checkpoint file --> "mpirun
> > gmx_mpi mdrun -cpi [name].cpt -deffnm [name]".
> > as far as I know -cpi option is used to refer to checkpoint file as input
> > file.
> > what I have to change in my script ?
> >
>
> Nothing about that aspect. But clearly your first run and the restart
> simulating loss of a node are finding different gmx_mpi binaries from their
> respective environments. This is not itself a problem, but it's probably
> not what you intend, and may be symptomatic of the same issue that leads to
> md_test.xtc not being accessible.
>
> Mark
>
>
> >
> > but the fact that the node name is showing up in the check
> > > that takes place when the checkpoint is read is not relevant to the
> > > problem.
> > >
> > > Mark
> > >
> > > On Thu, Jun 16, 2016 at 9:46 AM Husen R <hus3nr at gmail.com> wrote:
> > >
> > > > On Thu, Jun 16, 2016 at 2:32 PM, Mark Abraham <
> > mark.j.abraham at gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Jun 16, 2016 at 9:30 AM Husen R <hus3nr at gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thank you for your reply !
> > > > > >
> > > > > > md_test.xtc is exist and writable.
> > > > > >
> > > > >
> > > > > OK, but it needs to be seen that way from the set of compute nodes
> > you
> > > > are
> > > > > using, and organizing that is up to you and your job scheduler,
> etc.
> > > > >
> > > > >
> > > > > > I tried to restart from checkpoint file by excluding other node
> > than
> > > > > > compute-node and it works.
> > > > > >
> > > > >
> > > > > Go do that, then :-)
> > > > >
> > > >
> > > > I'm building a simple system that can respond to node failure. if
> > failure
> > > > occured on node A, than the application has to be restarted and that
> > node
> > > > has to be excluded.
> > > > this should apply to all node including this 'compute-node'.
> > > >
> > > > >
> > > > >
> > > > > > only '--exclude=compute-node' that produces this error.
> > > > > >
> > > > >
> > > > > Then there's something about that node that is special with respect
> > to
> > > > the
> > > > > file system - there's nothing about any particular node that
> GROMACS
> > > > cares
> > > > > about.
> > > > >
> > > >
> > > > > Mark
> > > > >
> > > > >
> > > > > > is this has the same issue with this thread ?
> > > > > >
> http://comments.gmane.org/gmane.science.biology.gromacs.user/40984
> > > > > >
> > > > > > regards,
> > > > > >
> > > > > > Husen
> > > > > >
> > > > > > On Thu, Jun 16, 2016 at 2:20 PM, Mark Abraham <
> > > > mark.j.abraham at gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > The stuff about different nodes or numbers of nodes doesn't
> > matter
> > > -
> > > > > it's
> > > > > > > merely an advisory note from mdrun. mdrun failed when it tried
> to
> > > > > operate
> > > > > > > upon md_test.xtc, so perhaps you need to consider whether the
> > file
> > > > > > exists,
> > > > > > > is writable, etc.
> > > > > > >
> > > > > > > Mark
> > > > > > >
> > > > > > > On Thu, Jun 16, 2016 at 6:48 AM Husen R <hus3nr at gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I got the following error message when I tried to restart
> > gromacs
> > > > > > > > simulation from checkpoint file.
> > > > > > > > I restart the simulation using fewer nodes and processes, and
> > > also
> > > > I
> > > > > > > > exclude one node using '--exclude=' option (in slurm) for
> > > > > experimental
> > > > > > > > purpose.
> > > > > > > >
> > > > > > > > I'm sure fewer nodes and processes are not the cause of this
> > > error
> > > > > as I
> > > > > > > > already test that.
> > > > > > > > I have checked that the cause of this error is '--exclude='
> > > usage.
> > > > I
> > > > > > > > excluded 1 node named 'compute-node' when restart from
> > checkpoint
> > > > (at
> > > > > > > first
> > > > > > > > run, I use all node including 'compute-node').
> > > > > > > >
> > > > > > > >
> > > > > > > > it seems that at first run, the submit job script was built
> at
> > > > > > > > compute-node. So, at restart, build user mismatch appeared
> > > because
> > > > > > > > compute-node was not found (excluded).
> > > > > > > >
> > > > > > > > Am I right ? is this behavior normal ?
> > > > > > > > or is that a way to avoid this, so I can freely restart from
> > > > > checkpoint
> > > > > > > > using any nodes without limitation.
> > > > > > > >
> > > > > > > > thank you in advance
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > >
> > > > > > > > Husen
> > > > > > > >
> > > > > > > > ==========================restart script=================
> > > > > > > > #!/bin/bash
> > > > > > > > #SBATCH -J ayo
> > > > > > > > #SBATCH -o md%j.out
> > > > > > > > #SBATCH -A necis
> > > > > > > > #SBATCH -N 2
> > > > > > > > #SBATCH -n 16
> > > > > > > > #SBATCH --exclude=compute-node
> > > > > > > > #SBATCH --time=144:00:00
> > > > > > > > #SBATCH --mail-user=hus3nr at gmail.com
> > > > > > > > #SBATCH --mail-type=begin
> > > > > > > > #SBATCH --mail-type=end
> > > > > > > >
> > > > > > > > mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> > > > > > > > =====================================================
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ==================================output
> > > > > error========================
> > > > > > > > Reading checkpoint file md_test.cpt generated: Wed Jun 15
> > > 16:30:44
> > > > > 2016
> > > > > > > >
> > > > > > > >
> > > > > > > > Build time mismatch,
> > > > > > > > current program: Sel Apr 5 13:37:32 WIB 2016
> > > > > > > > checkpoint file: Rab Apr 6 09:44:51 WIB 2016
> > > > > > > >
> > > > > > > > Build user mismatch,
> > > > > > > > current program: pro at head-node [CMAKE]
> > > > > > > > checkpoint file: pro at compute-node [CMAKE]
> > > > > > > >
> > > > > > > > #ranks mismatch,
> > > > > > > > current program: 16
> > > > > > > > checkpoint file: 24
> > > > > > > >
> > > > > > > > #PME-ranks mismatch,
> > > > > > > > current program: -1
> > > > > > > > checkpoint file: 6
> > > > > > > >
> > > > > > > > GROMACS patchlevel, binary or parallel settings differ from
> > > > previous
> > > > > > run.
> > > > > > > > Continuation is exact, but not guaranteed to be binary
> > identical.
> > > > > > > >
> > > > > > > >
> > > > > > > > -------------------------------------------------------
> > > > > > > > Program gmx mdrun, VERSION 5.1.2
> > > > > > > > Source code file:
> > > > > > > > /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp,
> > line:
> > > > 2216
> > > > > > > >
> > > > > > > > Fatal error:
> > > > > > > > Truncation of file md_test.xtc failed. Cannot do appending
> > > because
> > > > of
> > > > > > > this
> > > > > > > > failure.
> > > > > > > > For more information and tips for troubleshooting, please
> check
> > > the
> > > > > > > GROMACS
> > > > > > > > website at http://www.gromacs.org/Documentation/Errors
> > > > > > > > -------------------------------------------------------
> > > > > > > >
> > ================================================================
> > > > > > > > --
> > > > > > > > Gromacs Users mailing list
> > > > > > > >
> > > > > > > > * Please search the archive at
> > > > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> > > before
> > > > > > > > posting!
> > > > > > > >
> > > > > > > > * Can't post? Read
> > http://www.gromacs.org/Support/Mailing_Lists
> > > > > > > >
> > > > > > > > * For (un)subscribe requests visit
> > > > > > > >
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > > > or
> > > > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > > >
> > > > > > > --
> > > > > > > Gromacs Users mailing list
> > > > > > >
> > > > > > > * Please search the archive at
> > > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> > before
> > > > > > > posting!
> > > > > > >
> > > > > > > * Can't post? Read
> http://www.gromacs.org/Support/Mailing_Lists
> > > > > > >
> > > > > > > * For (un)subscribe requests visit
> > > > > > >
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > > or
> > > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > >
> > > > > > --
> > > > > > Gromacs Users mailing list
> > > > > >
> > > > > > * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> before
> > > > > > posting!
> > > > > >
> > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > >
> > > > > > * For (un)subscribe requests visit
> > > > > >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > or
> > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > >
> > > > > --
> > > > > Gromacs Users mailing list
> > > > >
> > > > > * Please search the archive at
> > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > posting!
> > > > >
> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > >
> > > > > * For (un)subscribe requests visit
> > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > > > send a mail to gmx-users-request at gromacs.org.
> > > > >
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list