[gmx-users] The problem of utilizing multiple GPU

Sat Sep 7 16:52:21 CEST 2019

Hi,

I think Szilard meant more like

gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gputask 0022446

for the assignment of the 7 GPU tasks.

Mark

On Fri., 6 Sep. 2019, 22:07 sunyeping, <sunyeping at aliyun.com> wrote:

> Hello Szilárd Páll
>
> Thank you for you reply. I tried your command:
>
>  gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6
> -gputask 001122334
>
> but got the following error information:
>
> Using 7 MPI threads
> Using 10 OpenMP threads per tMPI thread
>
> Program:     gmx mdrun, version 2019.3
> Source file: src/gromacs/taskassignment/taskassignment.cpp (line 255)
> Function:    std::vector<std::vector<gmx::GpuTaskMapping> >::value_type
> gmx::runTaskAssignment(const std::vector<int>&, const std::vector<int>&,
> const gmx_hw_info_t&, const gmx::MDLogger&, const t_commrec*, const
> gmx_multisim_t*, const gmx::PhysicalNodeCommunicator&, const
> std::vector<gmx::GpuTask>&, bool, PmeRunMode)
> MPI rank:    0 (out of 7)
>
> Inconsistency in user input:
> There were 7 GPU tasks found on node localhost.localdomain, but 4 GPUs were
> available. If the GPUs are equivalent, then it is usually best to have a
> number of tasks that is a multiple of the number of GPUs. You should
> reconsider your GPU task assignment, number of ranks, or your use of the
> -nb,
> -pme, and -npme options, perhaps after measuring the performance you can
> get.
>
> Could you tell me how to correct this?
>
> Best regards,
> Yeping
>
> ------------------------------------------------------------------
> Hi,
>
> You have 2x Xeon Gold 6150 which is 2x 18 = 36 cores; Intel CPUs
> support 2 threads/core (HyperThreading), hence the 72.
>
> https://ark.intel.com/content/www/us/en/ark/products/120490/intel-xeon-gold-6150-processor-24-75m-cache-2-70-ghz.html
>
> You will not be able to scale efficiently over 8 GPUs in a single
> simulation with the current code; while performance will likely
> improve in the next release, due to PCI bus and PME scaling
> limitations, even with GROMACS 2020 it is unlikely you will see much
> benefit beyond 4 GPUs.
>
> Try running on 3-4 GPUs with at least 2 ranks on each, and one
> separate PME rank. You might also want to use every second GPU rather
> than the first four to avoid overloading the PCI bus; e.g.
> gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6
> -gputask 001122334
>
> Cheers,
> --
> Szilárd
>
> On Thu, Sep 5, 2019 at 1:12 AM 孙业平 <sunyeping at aliyun.com> wrote:
> >> Hello Mark Abraham,
> >> Thank you very much for your reply. I will definitely check the webinar
> and gromacs document. But now I am confused and expect an direct solution.
> The workstation should have 18 cores each with 4 hyperthreads. The output
> of "lscpu" reads:
> > Architecture:          x86_64
> > CPU op-mode(s):        32-bit, 64-bit
> > Byte Order:            Little Endian
> > CPU(s):                72
> > On-line CPU(s) list:   0-71
> > Thread(s) per core:    2
> > Core(s) per socket:    18
> > Socket(s):             2
> > NUMA node(s):          2
> > Vendor ID:             GenuineIntel
> > CPU family:            6
> > Model:                 85
> > Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> > Stepping:              4
> > CPU MHz:               2701.000
> > CPU max MHz:           2701.0000
> > CPU min MHz:           1200.0000
> > BogoMIPS:              5400.00
> > Virtualization:        VT-x
> > L1d cache:             32K
> > L1i cache:             32K
> > L2 cache:              1024K
> > L3 cache:              25344K
> > NUMA node0 CPU(s):     0-17,36-53
> > NUMA node1 CPU(s):     18-35,54-71
> >> Now I don't want to do multiple simulations and just want to run a
> single simulation. When assigning the simulation to only one GPU (gmx mdrun
> -v -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However,
> when I don't assign the GPU but let all GPU work by:
> >        gmx mdrun -v -deffnm md
> > The simulation performance is only 2 ns/day.
> >> So what is correct command to make a full use of all GPUs and achieve
> the best performance (which I expect should be much higher than 90 ns/day
> with only one GPU)? Could you give me further suggestions and help?
> >> Best regards,
> > Yeping
> ------------------------------------------------------------------
> From:孙业平 <sunyeping at aliyun.com>
> Sent At:2019 Sep. 5 (Thu.) 07:12
> To:gromacs <gmx-users at gromacs.org>; Mark Abraham <mark.j.abraham at gmail.com
> >
> Cc:gromacs.org_gmx-users <gromacs.org_gmx-users at maillist.sys.kth.se>
> Subject:Re: [gmx-users] The problem of utilizing multiple GPU
>
> Hello Mark Abraham,
>
> Thank you very much for your reply. I will definitely check the webinar
> and gromacs document. But now I am confused and expect an direct solution.
> The workstation should have 18 cores each with 4 hyperthreads. The output
> of "lscpu" reads:
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                72
> On-line CPU(s) list:   0-71
> Thread(s) per core:    2
> Core(s) per socket:    18
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 85
> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping:              4
> CPU MHz:               2701.000
> CPU max MHz:           2701.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              5400.00
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              1024K
> L3 cache:              25344K
> NUMA node0 CPU(s):     0-17,36-53
> NUMA node1 CPU(s):     18-35,54-71
>
> Now I don't want to do multiple simulations and just want to run a single
> simulation. When assigning the simulation to only one GPU (gmx mdrun -v
> -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However,
> when I don't assign the GPU but let all GPU work by:
>        gmx mdrun -v -deffnm md
> The simulation performance is only 2 ns/day.
>
> So what is correct command to make a full use of all GPUs and achieve the
> best performance (which I expect should be much higher than 90 ns/day with
> only one GPU)? Could you give me further suggestions and help?
>
> Best regards,
> Yeping
>
> ------------------------------------------------------------------
> From:Mark Abraham <mark.j.abraham at gmail.com>
> Sent At:2019 Sep. 4 (Wed.) 19:10
> To:gromacs <gmx-users at gromacs.org>; 孙业平 <sunyeping at aliyun.com>
> Cc:gromacs.org_gmx-users <gromacs.org_gmx-users at maillist.sys.kth.se>
> Subject:Re: [gmx-users] The problem of utilizing multiple GPU
>
> Hi,
>
>
> On Wed, 4 Sep 2019 at 12:54, sunyeping <sunyeping at aliyun.com> wrote:
> Dear everyone,
>
>  I am trying to do simulation with a workstation with 72 core and 8
> geforce 1080 GPUs.
>
> 72 cores, or just 36 cores each with two hyperthreads? (it matters because
> you might not want to share cores between simulations, which is what you'd
> get if you just assigned 9 hyperthreads per GPU and 1 GPU per simulation).
>
>  When I do not assign a certain GPU with the command:
>        gmx mdrun -v -deffnm md
>  all GPUs are used and but the utilization of each GPU is extremely low
> (only 1-2 %), and the simulation will be finished after several months.
>
> Yep. Too many workers for not enough work means everyone spends time more
> time coordinating than working. This is likely to improve in GROMACS 2020
> (beta out shortly).
>
>  In contrast, when I assign the simulation task to only one GPU:
>  gmx mdrun -v -gpu_id 0 -deffnm md
>  the GPU utilization can reach 60-70%, and the simulation can be finished
> within a week. Even when I use only two GPU:
>
> Utilization is only a proxy - what you actually want to measure is the
> rate of simulation ie. ns/day.
>
>   gmx mdrun -v -gpu_id 0,2 -deffnm md
>
>  the GPU utilizations are very low and the simulation is very slow.
>
> That could be for a variety of reasons, which you could diagnose by
> looking at the performance report at the end of the log file, and comparing
> different runs.
>  I think I may missuse the GPU for gromacs simulation. Could you tell me
> what is the correct way to use multiple GPUs?
>
> If you're happy running multiple simulations, then the easiest thing to do
> is to use the existing multi-simulation support to do
>
> mpirun -np 8 gmx_mpi -multidir dir0 dir1 dir2 ... dir7
>
> and let mdrun handle the details. Otherwise you have to get involved in
> assigning a subset of the CPU cores and GPUs to each job that both runs
> fast and does not conflict. See the documentation for GROMACS for the
> version you're running e.g.
> http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node
> .
>
> You probably want to check out this webinar tomorrow
> https://bioexcel.eu/webinar-more-bang-for-your-buck-improved-use-of-gpu-nodes-for-gromacs-2018-2019-09-05/
> .
>
> Mark
>  Best regards
>  --
>  Gromacs Users mailing list
>
>  * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
>  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>  * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.