[gmx-users] gromacs performance

Fri Mar 8 23:40:32 CET 2019

Benson,
When I was testing on a single machine, performance was moving by leaps and bounds, like this:

-- 2 hours on a c5.2xlarge
-- 68 minutes on a p2.xlarge
-- 18 minutes on a p3.2xlarge
-- 7 minutes on a p3.dn24xlarge

It's only when I switched to using clusters that things went downhill and I haven't been able to beat the above numbers by throwing more CPUs and GPUs at it.

CJ

-----Original Message-----
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> On Behalf Of Benson Muite
Sent: Friday, March 8, 2019 4:19 PM
To: gromacs.org_gmx-users at maillist.sys.kth.se
Subject: Re: [gmx-users] gromacs performance

You seem to be using a relatively large number of GPUs. May want to check your input data (many cases will not scale well, but ensemble runs can be quite common). Perhaps check speedup in going from 1 to 2 to 4 GPUs on one node.

On 3/9/19 12:11 AM, Carlos Rivas wrote:
> Hey guys,
> Anybody running GROMACS on AWS ?
>
> I have a strong IT background , but zero understanding of GROMACS or 
> OpenMPI. ( even less using sge on AWS ), Just trying to help some PHD Folks with their work.
>
> When I run gromacs using Thread-mpi on a single, very large node on AWS things work fairly fast.
> However, when I switch from thread-mpi to OpenMPI even though everything's detected properly, the performance is horrible.
>
> This is what I am submitting to sge:
>
> ubuntu at ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat sge.sh 
> #!/bin/bash # #$ -cwd #$ -j y #$ -S /bin/bash #$ -e out.err #$ -o 
> out.out #$ -pe mpi 256
>
> cd /shared/charmm-gui/gromacs
> touch start.txt
> /bin/bash /shared/charmm-gui/gromacs/run_eq.bash
> touch end.txt
>
> and this is my test script , provided by one of the Doctors:
>
> ubuntu at ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat run_eq.bash 
> #!/bin/bash export GMXMPI="/usr/bin/mpirun --mca btl ^openib 
> /shared/gromacs/5.1.5/bin/gmx_mpi"
>
> export MDRUN="mdrun -ntomp 2 -npme 32"
>
> export GMX="/shared/gromacs/5.1.5/bin/gmx_mpi"
>
> for comm in min eq; do
> if [ $comm == min ]; then
>     echo ${comm}
>     $GMX grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_charmm2gmx.pdb -p topol.top
>     $GMXMPI $MDRUN -deffnm step6.0_minimization
>
> fi
>
> if [ $comm == eq ]; then
>    for step in `seq 1 6`;do
>     echo $step
>     if [ $step -eq 1 ]; then
>        echo ${step}
>        $GMX grompp -f step6.${step}_equilibration.mdp -o step6.${step}_equilibration.tpr -c step6.0_minimization.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
>        $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
>     fi
>     if [ $step -gt 1 ]; then
>        old=`expr $step - 1`
>        echo $old
>        $GMX grompp -f step6.${step}_equilibration.mdp -o step6.${step}_equilibration.tpr -c step6.${old}_equilibration.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
>        $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
>     fi
>    done
> fi
> done
>
>
>
>
> during the output, I see this , and I get really excited, expecting blazing speeds and yet, it's much worse than a single node:
>
> Command line:
>    gmx_mpi mdrun -ntomp 2 -npme 32 -deffnm step6.0_minimization
>
>
> Back Off! I just backed up step6.0_minimization.log to 
> ./#step6.0_minimization.log.6#
>
> Running on 4 nodes with total 128 cores, 256 logical cores, 32 compatible GPUs
>    Cores per node:           32
>    Logical cores per node:   64
>    Compatible GPUs per node:  8
>    All nodes have identical type(s) of GPUs Hardware detected on host 
> ip-10-10-5-89 (the node of MPI rank 0):
>    CPU info:
>      Vendor: GenuineIntel
>      Brand:  Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
>      SIMD instructions most likely to fit this hardware: AVX2_256
>      SIMD instructions selected at GROMACS compile time: AVX2_256
>    GPU info:
>      Number of GPUs detected: 8
>      #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
>      #1: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
>      #2: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
>      #3: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
>      #4: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
>      #5: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
>      #6: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible
>      #7: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, 
> stat: compatible
>
> Reading file step6.0_minimization.tpr, VERSION 5.1.5 (single 
> precision) Using 256 MPI processes Using 2 OpenMP threads per MPI 
> process
>
> On host ip-10-10-5-89 8 compatible GPUs are present, with IDs 
> 0,1,2,3,4,5,6,7 On host ip-10-10-5-89 8 GPUs auto-selected for this run.
> Mapping of GPU IDs to the 56 PP ranks in this node: 
> 0,0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,
> 5,5,5,5,5,5,5,6,6,6,6,6,6,6,7,7,7,7,7,7,7
>
>
>
> Any suggestions? Greatly appreciate the help.
>
>
> Carlos J. Rivas
> Senior AWS Solutions Architect - Migration Specialist
>
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.