[gmx-users] mpi run on linux cluster of 16 nodes running lam
Y U Sasidhar
sasidhar at chem.iitb.ac.in
Sun Apr 21 08:33:14 CEST 2002
I am getting the following error. Pl advise me correction.
I am reall stuck with this. If all fails we have to recompile gromacs as
advised by
Erik a few days ago.
I very much appreciate your time and thank you.
==========================================================
Lam: LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
OS: Linux
cluster of 16 nodes ( P IV s )
shell: bash
Using rpm version of gmx
===========================================================
Script run:
lamboot
mpirun -v -c 16 -lamd -s n0 mdrun_mpi -np 16 -v -s full.tpr -e
full.edr -o full.trr -c after_full.gro -g full.log >& full.job &
more full.job
echo " finished the script "
==============================================================
error message frim full.job:
27742 mdrun_mpi running on n0 (o)
1640 mdrun_mpi running on n1
1911 mdrun_mpi running on n2
--- --- --- --- --- --- ---
1895 mdrun_mpi running on n15
NNODES=16, MYRANK=0, HOSTNAME=cluster.aero.iitb.ac.in
NNODES=16, MYRANK=4, HOSTNAME=node04
NNODES=16, MYRANK=1, HOSTNAME=node01
NNODES=16, MYRANK=5, HOSTNAME=node05
NNODES=16, MYRANK=2, HOSTNAME=node02
NNODES=16, MYRANK=10, HOSTNAME=node10
NNODES=16, MYRANK=6, HOSTNAME=node06
NNODES=16, MYRANK=3, HOSTNAME=node03
NNODES=16, MYRANK=8, HOSTNAME=node08
NNODES=16, MYRANK=12, HOSTNAME=node12
NNODES=16, MYRANK=9, HOSTNAME=node09
NNODES=16, MYRANK=7, HOSTNAME=node07
NNODES=16, MYRANK=13, HOSTNAME=node13
NNODES=16, MYRANK=14, HOSTNAME=node14
NNODES=16, MYRANK=11, HOSTNAME=node11
NNODES=16, MYRANK=15, HOSTNAME=node15
NODEID=0 argc=14
NODEID=1 argc=14
NODEID=3 argc=14
NODEID=2 argc=14
NODEID=4 argc=14
NODEID=15 argc=14
NODEID=5 argc=14
NODEID=6 argc=14
NODEID=7 argc=14
NODEID=14 argc=14
NODEID=8 argc=14
NODEID=9 argc=14
NODEID=10 argc=14
NODEID=11 argc=14
NODEID=12 argc=14
NODEID=13 argc=14
:-) mdrun_mpi (-:
Option Filename Type Description
------------------------------------------------------------
-s full.tpr Input Generic run input: tpr tpb tpa
-o full.trr Output Full precision trajectory: trr trj
-x traj.xtc Output, Opt. Compressed trajectory (portable xdr
format)
-c after_full.gro Output Generic structure: gro g96 pdb
-e full.edr Output Generic energy: edr ene
-g full.log Output Log file
-dgdl dgdl.xvg Output, Opt. xvgr/xmgr file
-table table.xvg Input, Opt. xvgr/xmgr file
-rerun rerun.xtc Input, Opt. Generic trajectory: xtc trr trj gro
g96 pdb
-ei sam.edi Input, Opt. ED sampling input
-eo sam.edo Output, Opt. ED sampling output
-j wham.gct Input, Opt. General coupling stuff
-jo bam.gct Input, Opt. General coupling stuff
-ffout gct.xvg Output, Opt. xvgr/xmgr file
-devout deviatie.xvg Output, Opt. xvgr/xmgr file
-runav runaver.xvg Output, Opt. xvgr/xmgr file
-pi pull.ppa Input, Opt. Pull parameters
-po pullout.ppa Output, Opt. Pull parameters
-pd pull.pdo Output, Opt. Pull data output
-pn pull.ndx Input, Opt. Index file
-mtx nm.mtx Output, Opt. Hessian matrix
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-[no]X bool no Use dialog box GUI to edit command line
options
-nice int 19 Set the nicelevel
-deffnm string Set the default filename for all file
options
-np int 16 Number of nodes, must be the same as used
for
grompp
-[no]v bool yes Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]multi bool no Do multiple simulations in parallel (only
with -np
> 1)
-[no]glas bool no Do glass simulation with special long range
corrections
-[no]ionize bool no Do a simulation including the effect of an
X-Ray
bombardment on your system
Fatal error: Could not open full1.log
Error on node 1, will try to stop all the nodes
Back Off! I just backed up full2.log to ./#full2.log.2#
Back Off! I just backed up full3.log to ./#full3.log.2#
Back Off! I just backed up full4.log to ./#full4.log.2#
Back Off! I just backed up full5.log to ./#full5.log.2#
Back Off! I just backed up full6.log to ./#full6.log.2#
Back Off! I just backed up full7.log to ./#full7.log.2#
Back Off! I just backed up full8.log to ./#full8.log.2#
Back Off! I just backed up full9.log to ./#full9.log.2#
Back Off! I just backed up full10.log to ./#full10.log.2#
Back Off! I just backed up full11.log to ./#full11.log.2#
Back Off! I just backed up full12.log to ./#full12.log.2#
Back Off! I just backed up full13.log to ./#full13.log.2#
Back Off! I just backed up full14.log to ./#full14.log.2#
Back Off! I just backed up full15.log to ./#full15.log.2#
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 1911 failed on node n2 with exit status 1.
-----------------------------------------------------------------------------
More information about the gromacs.org_gmx-users
mailing list