[gmx-users] Performance, gpu

Wed Aug 28 17:50:00 CEST 2019

Hi,

Your command chooses 44 PME ranks, thus 88-44=44 PP ranks. It gives each of
those 6 threads and 4 threads respectively. That's 44*6+44*4 threads which
is very much larger than the 88 total cores in your 4 nodes, ie
over-subscription. The number of PME-only ranks just changes how much it's
over-subscribed.

I'd be starting my investigation with

aprun -n 8 gmx_mpi mdrun -npme 4

and let the defaults work out that there's 1 GPU and 11 OpenMP threads per
rank to achieve full utilization.

Mark

On Wed, 28 Aug 2019 at 17:31, Alex <alexanderwien2k at gmail.com> wrote:

> Dear all,
> Whatever "-npme" likes 22, 44, 24, 48 ..  I use in below command, I always
> get the "WARNING: On rank 0: oversubscribing the available XXX logical CPU
> core per node with 88 threads, This will cause considerable performance
> loss."
>
> aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
> -gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes
>
> would you please help me choose a correct combinations of -npme and  ...
> to get a better performance, according to the attached case.log file in my
> previous email?
> Regards,
> Alex
>
> On Sat, Aug 24, 2019 at 11:21 AM Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> > Hi,
> >
> > There's a thread oversubscription warning in your log file that you
> should
> > definitely have read and acted upon :-) I'd be running more like one PP
> > rank per gpu and 4 PME ranks, picking ntomp and ntomp_pme according to
> what
> > gives best performance (which could require configuring your MPI
> invocation
> > accordingly).
> >
> > Mark
> >
> > On Fri., 23 Aug. 2019, 21:00 Alex, <alexanderwien2k at gmail.com> wrote:
> >
> > > Dear Gromacs user,
> > > Using a machine with below configurations and also below command I
> tried
> > to
> > > simulate a system with 479K atoms (mainly water) on CPU-GPU, the
> > > performance is around 1ns per 1 hour.
> > > According the information and also shared log file below, I would be so
> > > appreciated if you could comment on the submission command to improve
> the
> > > performance by involving better the GPU and CPU.
> > >
> > > %------------------------------------------------
> > > #PBS -l select=4:ncpus=22:mpiprocs=22:ngpus=1
> > > export OMP_NUM_THREADS=4
> > >
> > > aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
> > > -gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes
> > >
> > > Running on 4 nodes with total 88 cores, 176 logical cores, 4 compatible
> > > GPUs
> > >   Cores per node:           22
> > >   Logical cores per node:   44
> > >   Compatible GPUs per node:  1
> > >   All nodes have identical type(s) of GPUs
> > >
> > > %------------------------------------------------
> > > GROMACS version:    2018.1
> > > Precision:          single
> > > Memory model:       64 bit
> > > MPI library:        MPI
> > > OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
> > > GPU support:        CUDA
> > > SIMD instructions:  AVX2_256
> > > FFT library:
> commercial-fftw-3.3.6-pl1-fma-sse2-avx-avx2-avx2_128
> > > RDTSCP usage:       enabled
> > > TNG support:        enabled
> > > Hwloc support:      hwloc-1.11.0
> > > Tracing support:    disabled
> > > Built on:           2018-09-12 20:34:33
> > > Built by:           xxxx
> > > Build OS/arch:      Linux 3.12.61-52.111-default x86_64
> > > Build CPU vendor:   Intel
> > > Build CPU brand:    Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > Build CPU family:   6   Model: 79   Stepping: 1
> > > Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle
> > htt
> > > intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse
> > rdrnd
> > > rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > > C compiler:         /opt/cray/pe/craype/2.5.13/bin/cc GNU 5.3.0
> > > C compiler flags:    -march=core-avx2     -O3 -DNDEBUG
> -funroll-all-loops
> > > -fexcess-precision=fast
> > > C++ compiler:       /opt/cray/pe/craype/2.5.13/bin/CC GNU 5.3.0
> > > C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG
> > > -funroll-all-loops -fexcess-precision=fast
> > > CUDA compiler:
> > > /opt/nvidia/cudatoolkit8.0/8.0.61_2.3.13_g32c34f9-2.1/bin/nvcc nvcc:
> > NVIDIA
> > > (R) Cuda compiler driver;Copyright (c) 2005-2016 NVIDIA
> Corporation;Built
> > > on Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0,
> > > V8.0.61
> > > CUDA compiler
> > >
> > >
> >
> flags:-gencode;arch=compute_60,code=sm_60;-use_fast_math;-Wno-deprecated-gpu-targets;;;
> > >
> > >
> >
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > > CUDA driver:        9.20
> > > CUDA runtime:       8.0
> > > %-------------------------------------------------
> > > Log file:
> > > https://drive.google.com/open?id=1-myQ5rP85UWKb1262QDPa6kYhuzHPzLu
> > >
> > > Thank you,
> > > Alex
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>