[gmx-users] Seeking advice on how to build Gromacs on Teragrid resources

J. Nathan Scott scottjn at chemistry.montana.edu
Thu Dec 9 23:14:20 CET 2010


Hello gmx users! I realize this may be a touch off topic, but I am
hoping that someone out there can offer some advice on how to build
Gromacs for parallel use on a Teragrid site. Our group is currently
using Abe on Teragrid, and unfortunately the latest version of Gromacs
compiled for public use on Abe is 4.0.2. Apparently installation of
4.5.3 is at least on the to-do list for Abe, but we would very much
like to use 4.5.3 now if we can get this issue figured it out.

I have built a parallel version of mdrun using Abe installed versions
of fftw3 and mvapich2 using the following commands:

setenv CPPFLAGS "-I/usr/apps/math/fftw/fftw-3.1.2/gcc/include/
-I/usr/apps/mpi/marmot_mvapich2_intel/include"
setenv LDFLAGS "-L/usr/apps/math/fftw/fftw-3.1.2/gcc/lib
-L/usr/apps/mpi/marmot_mvapich2_intel/lib"
./configure --enable-mpi --enable-float --prefix=/u/ac/jnscott/gromacs
--program-suffix=_mpi
make -j 8 mdrun && make install-mdrun

My PBS script file looks like the following:

-------------------------------
#!/bin/csh
#PBS -l nodes=2:ppn=8
#PBS -V
#PBS -o pbs_nvt.out
#PBS -e pbs_nvt.err
#PBS -l walltime=2:00:00
#PBS -N gmx
cd /u/ac/jnscott/1stn/1stn_wt/oplsaa_spce
mvapich2-start-mpd
setenv NP `wc -l ${PBS_NODEFILE} | cut -d'/' -f1`
setenv MV2_SRQ_SIZE 4000
mpirun -np ${NP} mdrun_mpi -s nvt.tpr -o nvt.trr -x nvt.xtc -cpo
nvt.cpt -c nvt.gro -e nvt.edr -g nvt.log -dlb yes
-------------------------------

Unfortunately my runs always fail in the same manner. The log file
simply ends, as you can see below. It appears that Gromacs is picking
up the correct number of nodes specified in the PBS script, but then
something causes it to quit abruptly with no error message.

-------------------------------
<snip>
Initializing Domain Decomposition on 16 nodes
Dynamic load balancing: yes
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.526 nm, LJ-14, atoms 1735 1744
  multi-body bonded interactions: 0.526 nm, Ryckaert-Bell., atoms 1735 1744
Minimum cell size due to bonded interactions: 0.578 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nm
Estimated maximum distance required for P-LINCS: 0.820 nm
This distance will limit the DD cell size, you can override this with -rcon
Guess for relative PME load: 0.27
Will use 10 particle-particle and 6 PME only nodes
This is a guess, check the performance at the end of the log file
Using 6 separate PME nodes
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 10 cells with a minimum initial size of 1.025 nm
The maximum allowed number of cells is: X 5 Y 5 Z 4
Domain decomposition grid 2 x 5 x 1, separate PME nodes 6
PME domain decomposition: 2 x 3 x 1
Interleaving PP and PME nodes
This is a particle-particle only node

Domain decomposition nodeid 0, coordinates 0 0 0

Using two step summing over 2 groups of on average 5.0 processes

Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Will do PME sum in reciprocal space.

<snip>

Will do ordinary reciprocal space Ewald sum.
Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Cut-off's:   NS: 1   Coulomb: 1   LJ: 1
Long Range LJ corr.: <C6> 3.3589e-04
System total charge: 0.000
Generated table with 1000 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1000 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1000 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1000 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Enabling SPC-like water optimization for 6952 molecules.

Configuring nonbonded kernels...
Configuring standard C nonbonded kernels...
Testing x86_64 SSE2 support... present.

Removing pbc first time

Initializing Parallel LINear Constraint Solver

<snip>
Linking all bonded interactions to atoms
There are 9778 inter charge-group exclusions,
will use an extra communication step for exclusion forces for PME

The maximum number of communication pulses is: X 1 Y 2
The minimum size for domain decomposition cells is 0.827 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.35 Y 0.73
The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           1.000 nm
            two-body bonded interactions  (-rdd)   1.000 nm
          multi-body bonded interactions  (-rdd)   0.827 nm
  atoms separated by up to 5 constraints  (-rcon)  0.827 nm


Making 2D domain decomposition grid 2 x 5 x 1, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------
-----------------------------------------------------------

My PBS error file is not of much help either I fear, an example of
such a file is pasted below:

-----------------------------------
stty: standard input: Invalid argument
stty: standard input: Invalid argument
NNODES=16, MYRANK=0, HOSTNAME=abe0828
NNODES=16, MYRANK=2, HOSTNAME=abe0828
NNODES=16, MYRANK=12, HOSTNAME=abe0828
NNODES=16, MYRANK=4, HOSTNAME=abe0828
NNODES=16, MYRANK=10, HOSTNAME=abe0828
NNODES=16, MYRANK=8, HOSTNAME=abe0828
NNODES=16, MYRANK=6, HOSTNAME=abe0828
NNODES=16, MYRANK=14, HOSTNAME=abe0828
NODEID=0 argc=17
NODEID=2 argc=17
NODEID=4 argc=17
NODEID=10 argc=17
NODEID=12 argc=17
NODEID=6 argc=17
NODEID=14 argc=17
NODEID=8 argc=17
NNODES=16, MYRANK=5, HOSTNAME=abe0825
NNODES=16, MYRANK=13, HOSTNAME=abe0825
                         :-)  G  R  O  M  A  C  S  (-:

NNODES=16, MYRANK=9, HOSTNAME=abe0825
NNODES=16, MYRANK=11, HOSTNAME=abe0825
                   Great Red Oystrich Makes All Chemists Sane

                            :-)  VERSION 4.5.3  (-:

<snip>
Back Off! I just backed up nvt.log to ./#nvt.log.2#
Reading file nvt.tpr, VERSION 4.5.3 (single precision)

Will use 10 particle-particle and 6 PME only nodes
This is a guess, check the performance at the end of the log file
Making 2D domain decomposition 2 x 5 x 1

Back Off! I just backed up nvt.edr to ./#nvt.edr.2#
----------------------------------------------

The non-Torque section of the PBS log file is below:

-----------------------------------------------
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
running mpdallexit on abe0828
LAUNCHED mpd on abe0828  via
RUNNING: mpd on abe0828
LAUNCHED mpd on abe0825  via  abe0828
RUNNING: mpd on abe0825
abe0828_43972 (10.1.67.66)
abe0825_37571 (10.1.67.63)
rank 1 in job 1  abe0828_43972   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9
rank 0 in job 1  abe0828_43972   caused collective abort of all ranks
  exit status of rank 0: killed by signal 9
-------------------------------------------------

I would should also note that both .edr and .trr files are created in
the working directory, but both files are 0 bytes.

Like I said, I realize this question is perhaps a bit off the topic of
Gromacs exclusively, but I hope that someone can offer some tips or
spot any obvious problems with my method that I have not noticed and
would sincerely appreciate any help you can offer a novice.

Best Wishes,
-Nathan


----------
J. Nathan Scott, Ph.D.
Postdoctoral Fellow
Department of Chemistry and Biochemistry
Montana State University



More information about the gromacs.org_gmx-users mailing list