[gmx-users] Build time/Build user mismatch, fatal error truncation of file *.xtc failed

Husen R hus3nr at gmail.com
Thu Jun 16 07:54:32 CEST 2016


this is the rest of the error message..
regards,

Husen




Halting parallel program gmx mdrun on rank 0 out of 16
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Fatal error in PMPI_Bcast: Unknown error class, error stack:
PMPI_Bcast(1635)......................: MPI_Bcast(buf=0xcd9ed8, count=4,
MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477).................:
MPIR_Bcast(1501)......................:
MPIR_Bcast_intra(1272)................:
MPIR_SMP_Bcast(1104)..................:
MPIR_Bcast_binomial(256)..............:
MPIDU_Complete_posted_with_error(1189): Process failed
MPIR_SMP_Bcast(1111)..................:
MPIR_Bcast_binomial(327)..............: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635)........: MPI_Bcast(buf=0x1858e78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501)........:
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast(1111)....:
MPIR_Bcast_binomial(327): Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635)........: MPI_Bcast(buf=0x24f7e78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501)........:
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast(1111)....:
MPIR_Bcast_binomial(327): Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635)........: MPI_Bcast(buf=0xb21e78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501)........:
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast(1111)....:
MPIR_Bcast_binomial(327): Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1635)........: MPI_Bcast(buf=0x15fbe78, count=4, MPI_BYTE,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1477)...:
MPIR_Bcast(1501)........:
MPIR_Bcast_intra(1272)..:
MPIR_SMP_Bcast(1111)....:
MPIR_Bcast_binomial(327): Failure during collective

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 6983 RUNNING AT head-node
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

On Thu, Jun 16, 2016 at 11:48 AM, Husen R <hus3nr at gmail.com> wrote:

> Hi all,
>
> I got the following error message when I tried to restart gromacs
> simulation from checkpoint file.
> I restart the simulation using fewer nodes and processes, and also I
> exclude one node using '--exclude=' option (in slurm) for experimental
> purpose.
>
> I'm sure fewer nodes and processes are not the cause of this error as I
> already test that.
> I have checked that the cause of this error is '--exclude=' usage. I
> excluded 1 node named 'compute-node' when restart from checkpoint (at first
> run, I use all node including 'compute-node').
>
>
> it seems that at first run, the submit job script was built at
> compute-node. So, at restart, build user mismatch appeared because
> compute-node was not found (excluded).
>
> Am I right ? is this behavior normal ?
> or is that a way to avoid this, so I can freely restart from checkpoint
> using any nodes without limitation.
>
> thank you in advance
>
> Regards,
>
>
> Husen
>
> ==========================restart script=================
> #!/bin/bash
> #SBATCH -J ayo
> #SBATCH -o md%j.out
> #SBATCH -A necis
> #SBATCH -N 2
> #SBATCH -n 16
> #SBATCH --exclude=compute-node
> #SBATCH --time=144:00:00
> #SBATCH --mail-user=hus3nr at gmail.com
> #SBATCH --mail-type=begin
> #SBATCH --mail-type=end
>
> mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test
> =====================================================
>
>
>
>
> ==================================output error========================
> Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44 2016
>
>
>   Build time mismatch,
>     current program: Sel Apr  5 13:37:32 WIB 2016
>     checkpoint file: Rab Apr  6 09:44:51 WIB 2016
>
>   Build user mismatch,
>     current program: pro at head-node [CMAKE]
>     checkpoint file: pro at compute-node [CMAKE]
>
>   #ranks mismatch,
>     current program: 16
>     checkpoint file: 24
>
>   #PME-ranks mismatch,
>     current program: -1
>     checkpoint file: 6
>
> GROMACS patchlevel, binary or parallel settings differ from previous run.
> Continuation is exact, but not guaranteed to be binary identical.
>
>
> -------------------------------------------------------
> Program gmx mdrun, VERSION 5.1.2
> Source code file:
> /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216
>
> Fatal error:
> Truncation of file md_test.xtc failed. Cannot do appending because of this
> failure.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
> ================================================================
>
>


More information about the gromacs.org_gmx-users mailing list