[gmx-users] remd :One of the processes started by mpirun has exited with a nonzero exit code.

Mark Abraham Mark.Abraham at anu.edu.au
Sun Nov 13 15:31:40 CET 2011


On 14/11/2011 12:29 AM, 杜波 wrote:
> dear teacher
> when i do remd , i got an erro ! bu i do not know how to do !
> thanks!!!
>
> modify it under the terms of the GNU General Public License
> as published by the Free Software Foundation; either version 2
> of the License, or (at your option) any later version.
>
> :-) /export/software/bin/mdrun_mpi_4.5.5 (-:
>
> Option Filename Type Description
> ------------------------------------------------------------
> -s pmma.tpr Input Run input file: tpr tpb tpa
> -o md.trr Output Full precision trajectory: trr trj cpt
> -x traj.xtc Output, Opt. Compressed trajectory (portable xdr format)
> -cpi state.cpt Input, Opt. Checkpoint file
> -cpo state.cpt Output, Opt. Checkpoint file
> -c after_md.gro Output Structure file: gro g96 pdb etc.
> -e ener.edr Output Energy file
> -g md.log Output Log file
> -dhdl dhdl.xvg Output, Opt. xvgr/xmgr file
> -field field.xvg Output, Opt. xvgr/xmgr file
> -table table.xvg Input, Opt. xvgr/xmgr file
> -tablep tablep.xvg Input, Opt. xvgr/xmgr file
> -tableb table.xvg Input, Opt. xvgr/xmgr file
> -rerun rerun.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
> -tpi tpi.xvg Output, Opt. xvgr/xmgr file
> -tpid tpidist.xvg Output, Opt. xvgr/xmgr file
> -ei sam.edi Input, Opt. ED sampling input
> -eo sam.edo Output, Opt. ED sampling output
> -j wham.gct Input, Opt. General coupling stuff
> -jo bam.gct Output, Opt. General coupling stuff
> -ffout gct.xvg Output, Opt. xvgr/xmgr file
> -devout deviatie.xvg Output, Opt. xvgr/xmgr file
> -runav runaver.xvg Output, Opt. xvgr/xmgr file
> -px pullx.xvg Output, Opt. xvgr/xmgr file
> -pf pullf.xvg Output, Opt. xvgr/xmgr file
> -mtx nm.mtx Output, Opt. Hessian matrix
> -dn dipole.ndx Output, Opt. Index file
> -multidir ./0/ Input, Opt!, Mult.
> ./1/
> ./2/
> ./3/
> ./4/
> ./5/
> ./6/
> ./7/
> ./8/
> ./9/
> ./10/
> ./11/ Run directory
>
> Option Type Value Description
> ------------------------------------------------------
> -[no]h bool no Print help info and quit
> -[no]version bool no Print version info and quit
> -nice int 0 Set the nicelevel
> -deffnm string Set the default filename for all file options
> -xvg enum xmgrace xvg plot formatting: xmgrace, xmgr or none
> -[no]pd bool yes Use particle decompostion
> -dd vector 0 0 0 Domain decomposition grid, 0 is optimize
> -npme int -1 Number of separate nodes to be used for PME, -1
> is guess
> -ddorder enum interleave DD node order: interleave, pp_pme or cartesian
> -[no]ddcheck bool yes Check for all bonded interactions with DD
> -rdd real 0 The maximum distance for bonded interactions with
> DD (nm), 0 is determine from initial coordinates
> -rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate
> -dlb enum auto Dynamic load balancing (with DD): auto, no or yes
> -dds real 0.8 Minimum allowed dlb scaling of the DD cell size
> -gcom int -1 Global communication frequency
> -[no]v bool yes Be loud and noisy
> -[no]compact bool yes Write a compact log file
> -[no]seppot bool no Write separate V and dVdl terms for each
> interaction type and node to the log file(s)
> -pforce real -1 Print all forces larger than this (kJ/mol nm)
> -[no]reprod bool no Try to avoid optimizations that affect binary
> reproducibility
> -cpt real 15 Checkpoint interval (minutes)
> -[no]cpnum bool no Keep and number checkpoint files
> -[no]append bool yes Append to previous output files when continuing
> from checkpoint instead of adding the simulation
> part number to all file names
> -maxh real -1 Terminate after 0.99 times this time (hours)
> -multi int 0 Do multiple simulations in parallel
> -replex int 2000 Attempt replica exchange every # steps
> -reseed int -1 Seed for replica exchange, -1 is generate a seed
> -[no]ionize bool no Do a simulation including the effect of an X-Ray
> bombardment on your system
>
> NNODES=12, MYRANK=7, HOSTNAME=c0115
> NNODES=12, MYRANK=1, HOSTNAME=c0101
> NNODES=12, MYRANK=2, HOSTNAME=c0101
> NODEID=1 argc=26
> NNODES=12, MYRANK=5, HOSTNAME=c0113
> NNODES=12, MYRANK=6, HOSTNAME=c0113
> NODEID=5 argc=26
> NNODES=12, MYRANK=3, HOSTNAME=c0102
> NNODES=12, MYRANK=4, HOSTNAME=c0102
> NODEID=3 argc=26
> NODEID=2 argc=26
> NODEID=6 argc=26
> NODEID=4 argc=26
> NNODES=12, MYRANK=9, HOSTNAME=c0115
> NNODES=12, MYRANK=11, HOSTNAME=c0115
> NODEID=11 argc=26
> NODEID=7 argc=26
> NNODES=12, MYRANK=8, HOSTNAME=c0115
> NODEID=8 argc=26
> NNODES=12, MYRANK=10, HOSTNAME=c0115
> NODEID=9 argc=26
> NODEID=10 argc=26
>
> Back Off! I just backed up md.log to ./#md.log.1#
> Getting Loaded...
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
>
> Back Off! I just backed up md.log to ./#md.log.1#
>
> Back Off! I just backed up md.log to ./#md.log.1#
>
> Back Off! I just backed up md.log to ./#md.log.1#
>
> Back Off! I just backed up md.log to ./#md.log.1#
>
> Back Off! I just backed up md.log to ./#md.log.1#
>
> Back Off! I just backed up md.log to ./#md.log.1#
>
> Back Off! I just backed up md.log to ./#md.log.1#
> Getting Loaded...
> Getting Loaded...
> Loaded with Money
>
> Getting Loaded...
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
> Getting Loaded...
> Getting Loaded...
> Getting Loaded...
> Loaded with Money
>
> Loaded with Money
>
> Getting Loaded...
> Loaded with Money
>
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
>
> Back Off! I just backed up traj.xtc to ./#traj.xtc.1#
>
> Back Off! I just backed up traj.xtc to ./#traj.xtc.1#
>
> Back Off! I just backed up traj.xtc to ./#traj.xtc.1#
>
> Back Off! I just backed up ener.edr to ./#ener.edr.1#
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
>
> Back Off! I just backed up traj.xtc to ./#traj.xtc.1#
>
> Back Off! I just backed up ener.edr to ./#ener.edr.1#
>
> WARNING: This run will generate roughly 3150 Mb of data
>
>
> WARNING: This run will generate roughly 3150 Mb of data
>
>
> WARNING: This run will generate roughly 3150 Mb of data
>
> Loaded with Money
>
>
> Back Off! I just backed up ener.edr to ./#ener.edr.1#
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
> Loaded with Money
>
> Loaded with Money
>
>
> WARNING: This run will generate roughly 3150 Mb of data
>
> Reading file pmma.tpr, VERSION 4.5.5 (single precision)
> Loaded with Money
>
> MPI_Recv: process in local group is dead (rank 3, comm 6)
>
> WARNING: This run will generate roughly 3150 Mb of data
>
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 1498 failed on node n0 (192.168.1.251) with exit status 1.
> -----------------------------------------------------------------------------
> Rank (5, MPI_COMM_WORLD): Call stack within LAM:
> Rank (5, MPI_COMM_WORLD): - MPI_Recv()
> Rank (5, MPI_COMM_WORLD): - MPI_Bcast()
> Rank (5, MPI_COMM_WORLD): - MPI_Allgather()
> Rank (5, MPI_COMM_WORLD): - MPI_Allreduce()
> Rank (5, MPI_COMM_WORLD): - main()

1) 4.5.5 has a known bug with REMD, and this is consistent with those 
LAM diagnostics. You should either apply the fix here 
http://lists.gromacs.org/pipermail/gmx-developers/2011-October/005405.html 
or revert to 4.5.4
2) LAM is known to run into problems with GROMACS under some 
circumstances. OpenMPI is recommended.
3) Otherwise, without information that might be in the end of the 
logfile(s), we're guessing.

Mark



More information about the gromacs.org_gmx-users mailing list