[gmx-users] gromacs performance

Wed Mar 13 17:21:50 CET 2019

Hi,

First off, please post full log files; these contain much more than just
the excerpts you paste in.

Secondly, for parallel, multi-node runs this hardware is just too GPU-dense
to achieve a good CPU-GPU load balance and scaling will be really hard too
in most cases, but details will depend on the input systems and settings
(info which we would see in the full log).

Lastly, in general, running a decomposition assuming one rank per core with
GPUs is generally inefficient, typically 2-3 ranks per GPU are ideal (but
in this case the CPU-GPU load balance may be a stronger bottleneck).

Cheers,
--
Szilárd

On Fri, Mar 8, 2019 at 11:12 PM Carlos Rivas <crivas at infiniticg.com> wrote:

> Hey guys,
> Anybody running GROMACS on AWS ?
>
> I have a strong IT background , but zero understanding of GROMACS or
> OpenMPI. ( even less using sge on AWS ),
> Just trying to help some PHD Folks with their work.
>
> When I run gromacs using Thread-mpi on a single, very large node on AWS
> things work fairly fast.
> However, when I switch from thread-mpi to OpenMPI even though everything's
> detected properly, the performance is horrible.
>
> This is what I am submitting to sge:
>
> ubuntu at ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat sge.sh
> #!/bin/bash
> #
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #$ -e out.err
> #$ -o out.out
> #$ -pe mpi 256
>
> cd /shared/charmm-gui/gromacs
> touch start.txt
> /bin/bash /shared/charmm-gui/gromacs/run_eq.bash
> touch end.txt
>
> and this is my test script , provided by one of the Doctors:
>
> ubuntu at ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat run_eq.bash
> #!/bin/bash
> export GMXMPI="/usr/bin/mpirun --mca btl ^openib
> /shared/gromacs/5.1.5/bin/gmx_mpi"
>
> export MDRUN="mdrun -ntomp 2 -npme 32"
>
> export GMX="/shared/gromacs/5.1.5/bin/gmx_mpi"
>
> for comm in min eq; do
> if [ $comm == min ]; then
>    echo ${comm}
>    $GMX grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c
> step5_charmm2gmx.pdb -p topol.top
>    $GMXMPI $MDRUN -deffnm step6.0_minimization
>
> fi
>
> if [ $comm == eq ]; then
>   for step in `seq 1 6`;do
>    echo $step
>    if [ $step -eq 1 ]; then
>       echo ${step}
>       $GMX grompp -f step6.${step}_equilibration.mdp -o
> step6.${step}_equilibration.tpr -c step6.0_minimization.gro -r
> step5_charmm2gmx.pdb -n index.ndx -p topol.top
>       $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
>    fi
>    if [ $step -gt 1 ]; then
>       old=`expr $step - 1`
>       echo $old
>       $GMX grompp -f step6.${step}_equilibration.mdp -o
> step6.${step}_equilibration.tpr -c step6.${old}_equilibration.gro -r
> step5_charmm2gmx.pdb -n index.ndx -p topol.top
>       $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
>    fi
>   done
> fi
> done
>
>
>
>
> during the output, I see this , and I get really excited, expecting
> blazing speeds and yet, it's much worse than a single node:
>
> Command line:
>   gmx_mpi mdrun -ntomp 2 -npme 32 -deffnm step6.0_minimization
>
>
> Back Off! I just backed up step6.0_minimization.log to
> ./#step6.0_minimization.log.6#
>
> Running on 4 nodes with total 128 cores, 256 logical cores, 32 compatible
> GPUs
>   Cores per node:           32
>   Logical cores per node:   64
>   Compatible GPUs per node:  8
>   All nodes have identical type(s) of GPUs
> Hardware detected on host ip-10-10-5-89 (the node of MPI rank 0):
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
>     SIMD instructions most likely to fit this hardware: AVX2_256
>     SIMD instructions selected at GROMACS compile time: AVX2_256
>   GPU info:
>     Number of GPUs detected: 8
>     #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>     #1: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>     #2: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>     #3: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>     #4: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>     #5: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>     #6: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>     #7: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>
> Reading file step6.0_minimization.tpr, VERSION 5.1.5 (single precision)
> Using 256 MPI processes
> Using 2 OpenMP threads per MPI process
>
> On host ip-10-10-5-89 8 compatible GPUs are present, with IDs
> 0,1,2,3,4,5,6,7
> On host ip-10-10-5-89 8 GPUs auto-selected for this run.
> Mapping of GPU IDs to the 56 PP ranks in this node:
> 0,0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,6,6,7,7,7,7,7,7,7
>
>
>
> Any suggestions? Greatly appreciate the help.
>
>
> Carlos J. Rivas
> Senior AWS Solutions Architect - Migration Specialist
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>