[gmx-users] SGI Altix - mdrun_mpi

Chris Neale chris.neale at utoronto.ca
Wed Nov 14 21:53:42 CET 2007


I think that your host file is perhaps not correct. In any event, it is 
different than my usage. This is how I can run lam without a batch system:

ssh n1
./RunLam.sh &

###

$cat RunLam.sh
PATH=$PATH:/dir/to/lam/bin:.
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/dir/to/lam/lib
LAMRSH="ssh -x"
export LAMRSH PATH
lamboot -v lamhosts
/dir/to/lam/mpirun N /this/dir/mdrun.sh
lamhalt

###

$cat lamhosts
n1
n2
n3
n4


Chris.
######

Original message:

Hi guys

Just installed gmx on a SGI Altix system (64-way):

--> compiling mdrun with MPI support, single-precision
export CPPFLAGS=-I/home/nferreira/bin/fftw-3.0.1/include
export LDFLAGS=-L/home/nferreira/bin/fftw-3.0.1/lib
./configure --prefix=/home/nferreira/bin/gromacs-3.3.2 --enable-mpi 
--program-suffix=_mpi
make mdrun
make install-mdrun

--> compiling all gromacs package programs without MPI, single-precision
make distclean
./configure --prefix=/home/nferreira/bin/gromacs-3.3.2
make
make install



It runs fine on single nodes, but I can't put gmx running in parallel.
The jobs are submitted directly (see a sample script bellow), without 
any queuing handler, like PBS.



# This is my submission script
#######################
lamboot -v hostfile
lamnodes
# Testing mpi
mpirun -np 2 hello
# GMX run
grompp -np 2  -f equil -po equil_out -c after_em -p topology -o equil
mpirun -np 2 mdrun_mpi -np 2 -deffnm equil -c after_equil
lamhalt



# This is hostfile file
###############
# My boot schema
localhost cpu=64


I tryed several stuff (full paths for mpirun, mdrun_mpi, etc), but I'm 
always getting the same error. I also tested a hello program (from 
LAM-MPI user guide) and it gives no problems. Bellow is the output of 
the submission script:

nferreira at behemoth <http://www.gromacs.org/mailman/listinfo/gmx-users>:~/gmxbench> ./script

LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University

n-1<16608> ssi:boot:base:linear: booting n0 (localhost)
n-1<16608> ssi:boot:base:linear: finished
n0      localhost:64:origin,this_node
Hello, world! I am 0 of 2
Hello, world! I am 1 of 2
NNODES=2, MYRANK=1, HOSTNAME=behemoth
NNODES=2, MYRANK=0, HOSTNAME=behemoth
NODEID=0 argc=8
NODEID=1 argc=8
                         :-)  G  R  O  M  A  C  S  (-:

               GRoups of Organic Molecules in ACtion for Science

                            :-)  VERSION 3.3.2  (-:
[ ... snipped ...]

Getting Loaded...
Reading file equil.tpr, VERSION 3.3.2 (single precision)
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 16623 failed on node n0 (127.0.0.1) due to signal 11.
-----------------------------------------------------------------------------

LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University



Any ideas? Searching the mailing list, seems that this is a recurrent  
issue, but I was not able to find and answer.
And, the machine admin is not proficient in MPI.

Cheers,
Nuno

P.S. Other programs are running fine on this machine using the same LAM-MPI.











More information about the gromacs.org_gmx-users mailing list