[gmx-users] The problem of utilizing multiple GPU

Thu Sep 5 12:42:46 CEST 2019

Hi,

You have 2x Xeon Gold 6150 which is 2x 18 = 36 cores; Intel CPUs
support 2 threads/core (HyperThreading), hence the 72.
https://ark.intel.com/content/www/us/en/ark/products/120490/intel-xeon-gold-6150-processor-24-75m-cache-2-70-ghz.html

You will not be able to scale efficiently over 8 GPUs in a single
simulation with the current code; while performance will likely
improve in the next release, due to PCI bus and PME scaling
limitations, even with GROMACS 2020 it is unlikely you will see much
benefit beyond 4 GPUs.

Try running on 3-4 GPUs with at least 2 ranks on each, and one
separate PME rank. You might also want to use every second GPU rather
than the first four to avoid overloading the PCI bus; e.g.
gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6
-gputask 001122334

Cheers,
--
Szilárd

On Thu, Sep 5, 2019 at 1:12 AM 孙业平 <sunyeping at aliyun.com> wrote:
>
> Hello Mark Abraham,
>
> Thank you very much for your reply. I will definitely check the webinar and gromacs document. But now I am confused and expect an direct solution. The workstation should have 18 cores each with 4 hyperthreads. The output of "lscpu" reads:
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                72
> On-line CPU(s) list:   0-71
> Thread(s) per core:    2
> Core(s) per socket:    18
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 85
> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping:              4
> CPU MHz:               2701.000
> CPU max MHz:           2701.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              5400.00
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              1024K
> L3 cache:              25344K
> NUMA node0 CPU(s):     0-17,36-53
> NUMA node1 CPU(s):     18-35,54-71
>
> Now I don't want to do multiple simulations and just want to run a single simulation. When assigning the simulation to only one GPU (gmx mdrun -v -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However, when I don't assign the GPU but let all GPU work by:
>        gmx mdrun -v -deffnm md
> The simulation performance is only 2 ns/day.
>
> So what is correct command to make a full use of all GPUs and achieve the best performance (which I expect should be much higher than 90 ns/day with only one GPU)? Could you give me further suggestions and help?
>
> Best regards,
> Yeping
>
> ------------------------------------------------------------------
> From:Mark Abraham <mark.j.abraham at gmail.com>
> Sent At:2019 Sep. 4 (Wed.) 19:10
> To:gromacs <gmx-users at gromacs.org>; 孙业平 <sunyeping at aliyun.com>
> Cc:gromacs.org_gmx-users <gromacs.org_gmx-users at maillist.sys.kth.se>
> Subject:Re: [gmx-users] The problem of utilizing multiple GPU
>
> Hi,
>
>
> On Wed, 4 Sep 2019 at 12:54, sunyeping <sunyeping at aliyun.com> wrote:
> Dear everyone,
>
>  I am trying to do simulation with a workstation with 72 core and 8 geforce 1080 GPUs.
>
> 72 cores, or just 36 cores each with two hyperthreads? (it matters because you might not want to share cores between simulations, which is what you'd get if you just assigned 9 hyperthreads per GPU and 1 GPU per simulation).
>
>  When I do not assign a certain GPU with the command:
>        gmx mdrun -v -deffnm md
>  all GPUs are used and but the utilization of each GPU is extremely low (only 1-2 %), and the simulation will be finished after several months.
>
> Yep. Too many workers for not enough work means everyone spends time more time coordinating than working. This is likely to improve in GROMACS 2020 (beta out shortly).
>
>  In contrast, when I assign the simulation task to only one GPU:
>  gmx mdrun -v -gpu_id 0 -deffnm md
>  the GPU utilization can reach 60-70%, and the simulation can be finished within a week. Even when I use only two GPU:
>
> Utilization is only a proxy - what you actually want to measure is the rate of simulation ie. ns/day.
>
>   gmx mdrun -v -gpu_id 0,2 -deffnm md
>
>  the GPU utilizations are very low and the simulation is very slow.
>
> That could be for a variety of reasons, which you could diagnose by looking at the performance report at the end of the log file, and comparing different runs.
>  I think I may missuse the GPU for gromacs simulation. Could you tell me what is the correct way to use multiple GPUs?
>
> If you're happy running multiple simulations, then the easiest thing to do is to use the existing multi-simulation support to do
>
> mpirun -np 8 gmx_mpi -multidir dir0 dir1 dir2 ... dir7
>
> and let mdrun handle the details. Otherwise you have to get involved in assigning a subset of the CPU cores and GPUs to each job that both runs fast and does not conflict. See the documentation for GROMACS for the version you're running e.g. http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node.
>
> You probably want to check out this webinar tomorrow https://bioexcel.eu/webinar-more-bang-for-your-buck-improved-use-of-gpu-nodes-for-gromacs-2018-2019-09-05/.
>
> Mark
>  Best regards
>  --
>  Gromacs Users mailing list
>
>  * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
>  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>  * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.