[gmx-users] mpi run on linux cluster of 16 nodes running lam

Y U Sasidhar sasidhar at chem.iitb.ac.in
Sun Apr 21 08:33:14 CEST 2002


I am getting the following error. Pl advise me correction.
I am reall stuck with this. If all fails we have to recompile gromacs as
advised by
Erik a few days ago.
I very much appreciate your time and thank you.
==========================================================
Lam: LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
OS: Linux
cluster of 16 nodes ( P IV s )
shell: bash
Using rpm version of gmx
===========================================================
Script run:

lamboot
mpirun -v -c 16 -lamd -s n0 mdrun_mpi  -np 16  -v -s full.tpr -e
full.edr -o full.trr -c after_full.gro -g full.log >& full.job &
more full.job
echo " finished the script "
==============================================================
error message frim full.job:

27742 mdrun_mpi running on n0 (o)
1640 mdrun_mpi running on n1
1911 mdrun_mpi running on n2
--- --- --- --- --- --- ---
1895 mdrun_mpi running on n15
NNODES=16, MYRANK=0, HOSTNAME=cluster.aero.iitb.ac.in
NNODES=16, MYRANK=4, HOSTNAME=node04
NNODES=16, MYRANK=1, HOSTNAME=node01
NNODES=16, MYRANK=5, HOSTNAME=node05
NNODES=16, MYRANK=2, HOSTNAME=node02
NNODES=16, MYRANK=10, HOSTNAME=node10
NNODES=16, MYRANK=6, HOSTNAME=node06
NNODES=16, MYRANK=3, HOSTNAME=node03
NNODES=16, MYRANK=8, HOSTNAME=node08
NNODES=16, MYRANK=12, HOSTNAME=node12
NNODES=16, MYRANK=9, HOSTNAME=node09
NNODES=16, MYRANK=7, HOSTNAME=node07
NNODES=16, MYRANK=13, HOSTNAME=node13
NNODES=16, MYRANK=14, HOSTNAME=node14
NNODES=16, MYRANK=11, HOSTNAME=node11
NNODES=16, MYRANK=15, HOSTNAME=node15
NODEID=0 argc=14
NODEID=1 argc=14
NODEID=3 argc=14
NODEID=2 argc=14
NODEID=4 argc=14
NODEID=15 argc=14
NODEID=5 argc=14
NODEID=6 argc=14
NODEID=7 argc=14
NODEID=14 argc=14
NODEID=8 argc=14
NODEID=9 argc=14
NODEID=10 argc=14
NODEID=11 argc=14
NODEID=12 argc=14
NODEID=13 argc=14
                         
                              :-)  mdrun_mpi  (-:

Option     Filename  Type          Description
------------------------------------------------------------
  -s       full.tpr  Input         Generic run input: tpr tpb tpa
  -o       full.trr  Output        Full precision trajectory: trr trj
  -x       traj.xtc  Output, Opt.  Compressed trajectory (portable xdr
format)
  -c after_full.gro  Output        Generic structure: gro g96 pdb
  -e       full.edr  Output        Generic energy: edr ene
  -g       full.log  Output        Log file
-dgdl      dgdl.xvg  Output, Opt.  xvgr/xmgr file
-table    table.xvg  Input, Opt.   xvgr/xmgr file
-rerun    rerun.xtc  Input, Opt.   Generic trajectory: xtc trr trj gro
g96 pdb
 -ei        sam.edi  Input, Opt.   ED sampling input
 -eo        sam.edo  Output, Opt.  ED sampling output
  -j       wham.gct  Input, Opt.   General coupling stuff
 -jo        bam.gct  Input, Opt.   General coupling stuff
-ffout      gct.xvg  Output, Opt.  xvgr/xmgr file
-devout   deviatie.xvg  Output, Opt.  xvgr/xmgr file
-runav  runaver.xvg  Output, Opt.  xvgr/xmgr file
 -pi       pull.ppa  Input, Opt.   Pull parameters
 -po    pullout.ppa  Output, Opt.  Pull parameters
 -pd       pull.pdo  Output, Opt.  Pull data output
 -pn       pull.ndx  Input, Opt.   Index file
-mtx         nm.mtx  Output, Opt.  Hessian matrix

      Option   Type  Value  Description
------------------------------------------------------
      -[no]h   bool     no  Print help info and quit
      -[no]X   bool     no  Use dialog box GUI to edit command line
options
       -nice    int     19  Set the nicelevel
     -deffnm string         Set the default filename for all file
options
         -np    int     16  Number of nodes, must be the same as used
for
                            grompp
      -[no]v   bool    yes  Be loud and noisy
-[no]compact   bool    yes  Write a compact log file
  -[no]multi   bool     no  Do multiple simulations in parallel (only
with -np
                            > 1)
   -[no]glas   bool     no  Do glass simulation with special long range
                            corrections
 -[no]ionize   bool     no  Do a simulation including the effect of an
X-Ray
                            bombardment on your system

Fatal error: Could not open full1.log
Error on node 1, will try to stop all the nodes

Back Off! I just backed up full2.log to ./#full2.log.2#

Back Off! I just backed up full3.log to ./#full3.log.2#

Back Off! I just backed up full4.log to ./#full4.log.2#

Back Off! I just backed up full5.log to ./#full5.log.2#

Back Off! I just backed up full6.log to ./#full6.log.2#

Back Off! I just backed up full7.log to ./#full7.log.2#

Back Off! I just backed up full8.log to ./#full8.log.2#

Back Off! I just backed up full9.log to ./#full9.log.2#

Back Off! I just backed up full10.log to ./#full10.log.2#

Back Off! I just backed up full11.log to ./#full11.log.2#

Back Off! I just backed up full12.log to ./#full12.log.2#

Back Off! I just backed up full13.log to ./#full13.log.2#

Back Off! I just backed up full14.log to ./#full14.log.2#

Back Off! I just backed up full15.log to ./#full15.log.2#
-----------------------------------------------------------------------------

One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 1911 failed on node n2 with exit status 1.
-----------------------------------------------------------------------------




More information about the gromacs.org_gmx-users mailing list