[gmx-users] gromacs performance

Fri Mar 8 23:11:41 CET 2019

Hey guys,
Anybody running GROMACS on AWS ?

I have a strong IT background , but zero understanding of GROMACS or OpenMPI. ( even less using sge on AWS ),
Just trying to help some PHD Folks with their work.

When I run gromacs using Thread-mpi on a single, very large node on AWS things work fairly fast.
However, when I switch from thread-mpi to OpenMPI even though everything's detected properly, the performance is horrible.

This is what I am submitting to sge:

ubuntu at ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat sge.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -e out.err
#$ -o out.out
#$ -pe mpi 256

cd /shared/charmm-gui/gromacs
touch start.txt
/bin/bash /shared/charmm-gui/gromacs/run_eq.bash
touch end.txt

and this is my test script , provided by one of the Doctors:

ubuntu at ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat run_eq.bash
#!/bin/bash
export GMXMPI="/usr/bin/mpirun --mca btl ^openib /shared/gromacs/5.1.5/bin/gmx_mpi"

export MDRUN="mdrun -ntomp 2 -npme 32"

export GMX="/shared/gromacs/5.1.5/bin/gmx_mpi"

for comm in min eq; do
if [ $comm == min ]; then
   echo ${comm}
   $GMX grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_charmm2gmx.pdb -p topol.top
   $GMXMPI $MDRUN -deffnm step6.0_minimization

fi

if [ $comm == eq ]; then
  for step in `seq 1 6`;do
   echo $step
   if [ $step -eq 1 ]; then
      echo ${step}
      $GMX grompp -f step6.${step}_equilibration.mdp -o step6.${step}_equilibration.tpr -c step6.0_minimization.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
      $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
   fi
   if [ $step -gt 1 ]; then
      old=`expr $step - 1`
      echo $old
      $GMX grompp -f step6.${step}_equilibration.mdp -o step6.${step}_equilibration.tpr -c step6.${old}_equilibration.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
      $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
   fi
  done
fi
done

during the output, I see this , and I get really excited, expecting blazing speeds and yet, it's much worse than a single node:

Command line:
  gmx_mpi mdrun -ntomp 2 -npme 32 -deffnm step6.0_minimization

Back Off! I just backed up step6.0_minimization.log to ./#step6.0_minimization.log.6#

Running on 4 nodes with total 128 cores, 256 logical cores, 32 compatible GPUs
  Cores per node:           32
  Logical cores per node:   64
  Compatible GPUs per node:  8
  All nodes have identical type(s) of GPUs
Hardware detected on host ip-10-10-5-89 (the node of MPI rank 0):
  CPU info:
    Vendor: GenuineIntel
    Brand:  Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    SIMD instructions most likely to fit this hardware: AVX2_256
    SIMD instructions selected at GROMACS compile time: AVX2_256
  GPU info:
    Number of GPUs detected: 8
    #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
    #1: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
    #2: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
    #3: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
    #4: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
    #5: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
    #6: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
    #7: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible

Reading file step6.0_minimization.tpr, VERSION 5.1.5 (single precision)
Using 256 MPI processes
Using 2 OpenMP threads per MPI process

On host ip-10-10-5-89 8 compatible GPUs are present, with IDs 0,1,2,3,4,5,6,7
On host ip-10-10-5-89 8 GPUs auto-selected for this run.
Mapping of GPU IDs to the 56 PP ranks in this node: 0,0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,6,6,7,7,7,7,7,7,7

Any suggestions? Greatly appreciate the help.

Carlos J. Rivas
Senior AWS Solutions Architect - Migration Specialist