[gmx-users] The problem of utilizing multiple GPU
Szilárd Páll
pall.szilard at gmail.com
Thu Sep 5 12:42:46 CEST 2019
Hi,
You have 2x Xeon Gold 6150 which is 2x 18 = 36 cores; Intel CPUs
support 2 threads/core (HyperThreading), hence the 72.
https://ark.intel.com/content/www/us/en/ark/products/120490/intel-xeon-gold-6150-processor-24-75m-cache-2-70-ghz.html
You will not be able to scale efficiently over 8 GPUs in a single
simulation with the current code; while performance will likely
improve in the next release, due to PCI bus and PME scaling
limitations, even with GROMACS 2020 it is unlikely you will see much
benefit beyond 4 GPUs.
Try running on 3-4 GPUs with at least 2 ranks on each, and one
separate PME rank. You might also want to use every second GPU rather
than the first four to avoid overloading the PCI bus; e.g.
gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6
-gputask 001122334
Cheers,
--
Szilárd
On Thu, Sep 5, 2019 at 1:12 AM 孙业平 <sunyeping at aliyun.com> wrote:
>
> Hello Mark Abraham,
>
> Thank you very much for your reply. I will definitely check the webinar and gromacs document. But now I am confused and expect an direct solution. The workstation should have 18 cores each with 4 hyperthreads. The output of "lscpu" reads:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 72
> On-line CPU(s) list: 0-71
> Thread(s) per core: 2
> Core(s) per socket: 18
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 85
> Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping: 4
> CPU MHz: 2701.000
> CPU max MHz: 2701.0000
> CPU min MHz: 1200.0000
> BogoMIPS: 5400.00
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 1024K
> L3 cache: 25344K
> NUMA node0 CPU(s): 0-17,36-53
> NUMA node1 CPU(s): 18-35,54-71
>
> Now I don't want to do multiple simulations and just want to run a single simulation. When assigning the simulation to only one GPU (gmx mdrun -v -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However, when I don't assign the GPU but let all GPU work by:
> gmx mdrun -v -deffnm md
> The simulation performance is only 2 ns/day.
>
> So what is correct command to make a full use of all GPUs and achieve the best performance (which I expect should be much higher than 90 ns/day with only one GPU)? Could you give me further suggestions and help?
>
> Best regards,
> Yeping
>
> ------------------------------------------------------------------
> From:Mark Abraham <mark.j.abraham at gmail.com>
> Sent At:2019 Sep. 4 (Wed.) 19:10
> To:gromacs <gmx-users at gromacs.org>; 孙业平 <sunyeping at aliyun.com>
> Cc:gromacs.org_gmx-users <gromacs.org_gmx-users at maillist.sys.kth.se>
> Subject:Re: [gmx-users] The problem of utilizing multiple GPU
>
> Hi,
>
>
> On Wed, 4 Sep 2019 at 12:54, sunyeping <sunyeping at aliyun.com> wrote:
> Dear everyone,
>
> I am trying to do simulation with a workstation with 72 core and 8 geforce 1080 GPUs.
>
> 72 cores, or just 36 cores each with two hyperthreads? (it matters because you might not want to share cores between simulations, which is what you'd get if you just assigned 9 hyperthreads per GPU and 1 GPU per simulation).
>
> When I do not assign a certain GPU with the command:
> gmx mdrun -v -deffnm md
> all GPUs are used and but the utilization of each GPU is extremely low (only 1-2 %), and the simulation will be finished after several months.
>
> Yep. Too many workers for not enough work means everyone spends time more time coordinating than working. This is likely to improve in GROMACS 2020 (beta out shortly).
>
> In contrast, when I assign the simulation task to only one GPU:
> gmx mdrun -v -gpu_id 0 -deffnm md
> the GPU utilization can reach 60-70%, and the simulation can be finished within a week. Even when I use only two GPU:
>
> Utilization is only a proxy - what you actually want to measure is the rate of simulation ie. ns/day.
>
> gmx mdrun -v -gpu_id 0,2 -deffnm md
>
> the GPU utilizations are very low and the simulation is very slow.
>
> That could be for a variety of reasons, which you could diagnose by looking at the performance report at the end of the log file, and comparing different runs.
> I think I may missuse the GPU for gromacs simulation. Could you tell me what is the correct way to use multiple GPUs?
>
> If you're happy running multiple simulations, then the easiest thing to do is to use the existing multi-simulation support to do
>
> mpirun -np 8 gmx_mpi -multidir dir0 dir1 dir2 ... dir7
>
> and let mdrun handle the details. Otherwise you have to get involved in assigning a subset of the CPU cores and GPUs to each job that both runs fast and does not conflict. See the documentation for GROMACS for the version you're running e.g. http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node.
>
> You probably want to check out this webinar tomorrow https://bioexcel.eu/webinar-more-bang-for-your-buck-improved-use-of-gpu-nodes-for-gromacs-2018-2019-09-05/.
>
> Mark
> Best regards
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list