[gmx-users] Gromacs runs with SGE and LAM-MPI

Stéphane Teletchéa steletch at jouy.inra.fr
Wed Jan 11 11:07:04 CET 2006

I'm encountering difficulties launching jobs on the cluster while using 
SGE for launching the job.

I'm using the benchmark molecules as references for the jobs (to be sure 
input parameters are not problematic).

My input script is as follows:


#$ -S /bin/bash
#$ -V
#$ -M steletch at jouy.inra.fr
#$ -m eas
#$ -cwd
#$ -o ~/bench/gromacs3.3/d.villin/s64LAM_8_noht.q-8/d.villin-s64LAM.out
#$ -e ~/bench/gromacs3.3/d.villin/s64LAM_8_noht.q-8/d.villin-s64LAM.err

~/Programmes/gromacs-3.3_s64LAM/bin/grompp \
     -f ~/Benchmark_Gromacs/d.villin/grompp.mdp \
     -p ~/Benchmark_Gromacs/d.villin/topol.top \
     -c ~/Benchmark_Gromacs/d.villin/conf.gro \
     -o ~/d.villin/s64LAM_8_noht.q-8/d.villin_s64LAM_8_noht.q.tpr \
     -po ~/d.villin/s64LAM_8_noht.q-8/d.villin_s64LAM_8_noht.q.mdp \
     -np 8 \

/usr/local/public/lam/bin/mpirun -np 8 ~/gromacs-3.3_s64LAM/bin/mdrun \
    -s ~/d.villin/s64LAM_8_noht.q-8/d.villin_s64LAM_8_noht.q.tpr
    -o ~/d.villin/s64LAM_8_noht.q-8/d.villin_s64LAM_8_noht.q.trr \
    -c ~/d.villin/s64LAM_8_noht.q-8/d.villin_s64LAM_8_noht.q.gro \
    -g ~/d.villin/s64LAM_8_noht.q-8/d.villin_s64LAM_8_noht.q.log \
    -e ~/d.villin/s64LAM_8_noht.q-8/d.villin_s64LAM_8_noht.q.edr

If i run interactively the commands, the runs starts and executes 
flawlessly (gromacs and LAM/MPI interact as exepected). I'm just 
encountering problems while using SGE for launching my jobs (which is 
mandatory since we share the cluster amongst users).

On the error logs, i get :
The selected RPI failed to initialize during MPI_INIT.  This is a
fatal error; I must abort.

This occurred on host n57 (n2).
The PID of failed process was 2958 (MPI_COMM_WORLD rank: 2)
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 2570 failed on node n0 ( with exit status 1.

We're working hard on it, but i thought some help from the list could 
drive us in the right direction.

Thanks a lot in advance for your answers,

S. Téletchéa

More informations :

System under Mandriva Linux LE2005, 64-bits edition
Gromacs version 3.3 (64 bits single precision)
SGE version  6.0u6
              LAM/MPI: 7.1.1
               Prefix: /usr/local/public/lam-7.1.1
         Architecture: x86_64-unknown-linux-gnu
        Configured by: root
        Configured on: Mon Jan  9 15:33:44 CET 2006
       Configure host: adm3
       Memory manager: ptmalloc2
           C bindings: yes
         C++ bindings: yes
     Fortran bindings: yes
           C compiler: gcc
         C++ compiler: g++
     Fortran compiler: g77
      Fortran symbols: double_underscore
          C profiling: yes
        C++ profiling: yes
    Fortran profiling: yes
       C++ exceptions: no
       Thread support: yes
        ROMIO support: yes
         IMPI support: no
        Debug support: no
         Purify clean: no
             SSI boot: globus (API v1.1, Module v0.6)
             SSI boot: rsh (API v1.1, Module v1.1)
             SSI boot: slurm (API v1.1, Module v1.0)
             SSI coll: lam_basic (API v1.1, Module v7.1)
             SSI coll: shmem (API v1.1, Module v1.0)
             SSI coll: smp (API v1.1, Module v1.2)
              SSI rpi: crtcp (API v1.1, Module v1.1)
              SSI rpi: lamd (API v1.0, Module v7.1)
              SSI rpi: sysv (API v1.0, Module v7.1)
              SSI rpi: tcp (API v1.0, Module v7.1)
              SSI rpi: usysv (API v1.0, Module v7.1)
               SSI cr: self (API v1.0, Module v1.0)

Stéphane Téletchéa, PhD.
Unité Mathématique Informatique et Génome http://migale.jouy.inra.fr/mig
INRA, Domaine de Vilvert                  Tél : (33) 134 652 121 / 3086
78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901

