孙业平 sunyeping at aliyun.com
Thu Sep 5 01:16:27 CEST 2019

Hello Mark Abraham,

Thank you very much for your reply. I will definitely check the webinar and gromacs document. But now I am confused and expect an direct solution. The workstation should have 18 cores each with 4 hyperthreads. The output of "lscpu" reads:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
Stepping:              4
CPU MHz:               2701.000
CPU max MHz:           2701.0000
CPU min MHz:           1200.0000
BogoMIPS:              5400.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-17,36-53
NUMA node1 CPU(s):     18-35,54-71

Now I don't want to do multiple simulations and just want to run a single simulation. When assigning the simulation to only one GPU (gmx mdrun -v -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However, when I don't assign the GPU but let all GPU work by:
       gmx mdrun -v -deffnm md
The simulation performance is only 2 ns/day.

So what is correct command to make a full use of all GPUs and achieve the best performance (which I expect should be much higher than 90 ns/day with only one GPU)? Could you give me further suggestions and help? 

Best regards,
On Wed, 4 Sep 2019 at 12:54, sunyeping <sunyeping at aliyun.com> wrote:
Dear everyone,

 I am trying to do simulation with a workstation with 72 core and 8 geforce 1080 GPUs.

72 cores, or just 36 cores each with two hyperthreads? (it matters because you might not want to share cores between simulations, which is what you'd get if you just assigned 9 hyperthreads per GPU and 1 GPU per simulation).

 When I do not assign a certain GPU with the command:
       gmx mdrun -v -deffnm md
 all GPUs are used and but the utilization of each GPU is extremely low (only 1-2 %), and the simulation will be finished after several months.  

Yep. Too many workers for not enough work means everyone spends time more time coordinating than working. This is likely to improve in GROMACS 2020 (beta out shortly).

 In contrast, when I assign the simulation task to only one GPU:
 gmx mdrun -v -gpu_id 0 -deffnm md
 the GPU utilization can reach 60-70%, and the simulation can be finished within a week. Even when I use only two GPU:

Utilization is only a proxy - what you actually want to measure is the rate of simulation ie. ns/day.

  gmx mdrun -v -gpu_id 0,2 -deffnm md

 the GPU utilizations are very low and the simulation is very slow.

That could be for a variety of reasons, which you could diagnose by looking at the performance report at the end of the log file, and comparing different runs.
 I think I may missuse the GPU for gromacs simulation. Could you tell me what is the correct way to use multiple GPUs?

If you're happy running multiple simulations, then the easiest thing to do is to use the existing multi-simulation support to do

mpirun -np 8 gmx_mpi -multidir dir0 dir1 dir2 ... dir7 

and let mdrun handle the details. Otherwise you have to get involved in assigning a subset of the CPU cores and GPUs to each job that both runs fast and does not conflict. See the documentation for GROMACS for the version you're running e.g. http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node.

You probably want to check out this webinar tomorrow https://bioexcel.eu/webinar-more-bang-for-your-buck-improved-use-of-gpu-nodes-for-gromacs-2018-2019-09-05/.

 Best regards
