[gmx-users] The problem of utilizing multiple GPU

Fri Sep 6 22:12:11 CEST 2019

Hello Szilárd Páll

Thank you for you reply. I tried your command:

 gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6 -gputask 001122334

but got the following error information:

Using 7 MPI threads
Using 10 OpenMP threads per tMPI thread

Program:     gmx mdrun, version 2019.3
Source file: src/gromacs/taskassignment/taskassignment.cpp (line 255)
Function:    std::vector<std::vector<gmx::GpuTaskMapping> >::value_type gmx::runTaskAssignment(const std::vector<int>&, const std::vector<int>&, const gmx_hw_info_t&, const gmx::MDLogger&, const t_commrec*, const gmx_multisim_t*, const gmx::PhysicalNodeCommunicator&, const std::vector<gmx::GpuTask>&, bool, PmeRunMode)
MPI rank:    0 (out of 7)

Inconsistency in user input:
There were 7 GPU tasks found on node localhost.localdomain, but 4 GPUs were
available. If the GPUs are equivalent, then it is usually best to have a
number of tasks that is a multiple of the number of GPUs. You should
reconsider your GPU task assignment, number of ranks, or your use of the -nb,
-pme, and -npme options, perhaps after measuring the performance you can get.

Could you tell me how to correct this?

Best regards,
Yeping

------------------------------------------------------------------
Hi,

You have 2x Xeon Gold 6150 which is 2x 18 = 36 cores; Intel CPUs
support 2 threads/core (HyperThreading), hence the 72.
https://ark.intel.com/content/www/us/en/ark/products/120490/intel-xeon-gold-6150-processor-24-75m-cache-2-70-ghz.html

You will not be able to scale efficiently over 8 GPUs in a single
simulation with the current code; while performance will likely
improve in the next release, due to PCI bus and PME scaling
limitations, even with GROMACS 2020 it is unlikely you will see much
benefit beyond 4 GPUs.

Try running on 3-4 GPUs with at least 2 ranks on each, and one
separate PME rank. You might also want to use every second GPU rather
than the first four to avoid overloading the PCI bus; e.g.
gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6
-gputask 001122334

Cheers,
--
Szilárd

On Thu, Sep 5, 2019 at 1:12 AM 孙业平 <sunyeping at aliyun.com> wrote:
>> Hello Mark Abraham,
>> Thank you very much for your reply. I will definitely check the webinar and gromacs document. But now I am confused and expect an direct solution. The workstation should have 18 cores each with 4 hyperthreads. The output of "lscpu" reads:
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                72
> On-line CPU(s) list:   0-71
> Thread(s) per core:    2
> Core(s) per socket:    18
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 85
> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping:              4
> CPU MHz:               2701.000
> CPU max MHz:           2701.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              5400.00
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              1024K
> L3 cache:              25344K
> NUMA node0 CPU(s):     0-17,36-53
> NUMA node1 CPU(s):     18-35,54-71
>> Now I don't want to do multiple simulations and just want to run a single simulation. When assigning the simulation to only one GPU (gmx mdrun -v -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However, when I don't assign the GPU but let all GPU work by:
>        gmx mdrun -v -deffnm md
> The simulation performance is only 2 ns/day.
>> So what is correct command to make a full use of all GPUs and achieve the best performance (which I expect should be much higher than 90 ns/day with only one GPU)? Could you give me further suggestions and help?
>> Best regards,
> Yeping
------------------------------------------------------------------
From:孙业平 <sunyeping at aliyun.com>
Sent At:2019 Sep. 5 (Thu.) 07:12
To:gromacs <gmx-users at gromacs.org>; Mark Abraham <mark.j.abraham at gmail.com>
Cc:gromacs.org_gmx-users <gromacs.org_gmx-users at maillist.sys.kth.se>
Subject:Re: [gmx-users] The problem of utilizing multiple GPU

Hello Mark Abraham,

Thank you very much for your reply. I will definitely check the webinar and gromacs document. But now I am confused and expect an direct solution. The workstation should have 18 cores each with 4 hyperthreads. The output of "lscpu" reads:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
Stepping:              4
CPU MHz:               2701.000
CPU max MHz:           2701.0000
CPU min MHz:           1200.0000
BogoMIPS:              5400.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-17,36-53
NUMA node1 CPU(s):     18-35,54-71

Now I don't want to do multiple simulations and just want to run a single simulation. When assigning the simulation to only one GPU (gmx mdrun -v -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However, when I don't assign the GPU but let all GPU work by:
       gmx mdrun -v -deffnm md
The simulation performance is only 2 ns/day.

So what is correct command to make a full use of all GPUs and achieve the best performance (which I expect should be much higher than 90 ns/day with only one GPU)? Could you give me further suggestions and help? 

Best regards,
Yeping

------------------------------------------------------------------
From:Mark Abraham <mark.j.abraham at gmail.com>
Sent At:2019 Sep. 4 (Wed.) 19:10
To:gromacs <gmx-users at gromacs.org>; 孙业平 <sunyeping at aliyun.com>
Cc:gromacs.org_gmx-users <gromacs.org_gmx-users at maillist.sys.kth.se>
Subject:Re: [gmx-users] The problem of utilizing multiple GPU

Hi,

On Wed, 4 Sep 2019 at 12:54, sunyeping <sunyeping at aliyun.com> wrote:
Dear everyone,

 I am trying to do simulation with a workstation with 72 core and 8 geforce 1080 GPUs.

72 cores, or just 36 cores each with two hyperthreads? (it matters because you might not want to share cores between simulations, which is what you'd get if you just assigned 9 hyperthreads per GPU and 1 GPU per simulation).

 When I do not assign a certain GPU with the command:
       gmx mdrun -v -deffnm md
 all GPUs are used and but the utilization of each GPU is extremely low (only 1-2 %), and the simulation will be finished after several months.  

Yep. Too many workers for not enough work means everyone spends time more time coordinating than working. This is likely to improve in GROMACS 2020 (beta out shortly).

 In contrast, when I assign the simulation task to only one GPU:
 gmx mdrun -v -gpu_id 0 -deffnm md
 the GPU utilization can reach 60-70%, and the simulation can be finished within a week. Even when I use only two GPU:

Utilization is only a proxy - what you actually want to measure is the rate of simulation ie. ns/day.

  gmx mdrun -v -gpu_id 0,2 -deffnm md

 the GPU utilizations are very low and the simulation is very slow.

That could be for a variety of reasons, which you could diagnose by looking at the performance report at the end of the log file, and comparing different runs.
 I think I may missuse the GPU for gromacs simulation. Could you tell me what is the correct way to use multiple GPUs?

If you're happy running multiple simulations, then the easiest thing to do is to use the existing multi-simulation support to do

mpirun -np 8 gmx_mpi -multidir dir0 dir1 dir2 ... dir7 

and let mdrun handle the details. Otherwise you have to get involved in assigning a subset of the CPU cores and GPUs to each job that both runs fast and does not conflict. See the documentation for GROMACS for the version you're running e.g. http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node.

You probably want to check out this webinar tomorrow https://bioexcel.eu/webinar-more-bang-for-your-buck-improved-use-of-gpu-nodes-for-gromacs-2018-2019-09-05/.

Mark
 Best regards
 -- 
 Gromacs Users mailing list

 * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.