[gmx-developers] Gromacs with GPU

Åke Sandgren ake.sandgren at hpc2n.umu.se
Fri Sep 22 13:10:33 CEST 2017


Hi!

I am seeing a possible performance enhancement (possibly) when running
gromacs on nodes with multiple gpu cards.
(And yes I know this is perhaps a mote point since current GPU cards
don't have dual engines per card)

System:
dual socket 14-core broadwell cpus
2 K80 cards, one on each socket.

Gromacs built with hwloc support.

When running a dual node (56 core)

gmx_mpi mdrun -npme 4 -s ion_channel_bench00.tpr -resetstep 20000 -o
bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g
bench.log -ntomp 7 -pin on -dlb yes

job, (slurm + cgroups), gromacs doesn't fully take hwloc info into
account. The job correctly gets allocated on cores, but looking at
nvidia-smi and hwloc-ps i can see that the PP processes are using a
suboptimal selection of GPU engines.

The PP processes are placed one on each CPU socket (according to which
process-ids are using the GPUs and the position of those pids according
to hwloc-ps), but they both uses gpu engines from the same (first) K80 card.

It would be better to have looked at the hwloc info and selected CUDA
devices 0,2 (or 1,3) instead of 0,1.


Any comments on that?

Attached nvidia-smi + hwloc-ps output

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
-------------- next part --------------
root at b-cn1302:~# nvidia-smi 
Fri Sep 22 12:59:57 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:0D:00.0     Off |                    0 |
| N/A   39C    P0   127W / 149W |     75MiB / 11439MiB |     63%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:0E:00.0     Off |                    0 |
| N/A   49C    P0   145W / 149W |     76MiB / 11439MiB |     64%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:88:00.0     Off |                    0 |
| N/A   27C    P8    26W / 149W |      2MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:89:00.0     Off |                    0 |
| N/A   32C    P8    29W / 149W |      2MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0    171227    C   gmx_mpi                                         73MiB |
|    1    171229    C   gmx_mpi                                         74MiB |
+-----------------------------------------------------------------------------+

hwloc-ps -t -l output:
171227	NUMANode:0		gmx_mpi
 171227	Core:0		
 171233	NUMANode:0		
 171235	NUMANode:0		
 171261	NUMANode:0		
 171268	Core:1		
 171269	Core:2		
 171270	Core:3		
 171271	Core:4		
 171272	Core:5		
 171273	Core:6		
 171280	NUMANode:0		
171228	Core:7 Core:8 Core:9 Core:10 Core:11 Core:12 Core:13 NUMANode:1		gmx_mpi
 171228	Core:7		
 171237	NUMANode:1		
 171238	NUMANode:1		
 171284	Core:8		
 171286	Core:9		
 171288	Core:10		
 171290	Core:11		
 171292	Core:12		
 171294	Core:13		
171229	NUMANode:0 Core:14 Core:15 Core:16 Core:17 Core:18 Core:19 Core:20		gmx_mpi
 171229	Core:14		
 171234	NUMANode:0		
 171236	NUMANode:0		
 171274	Core:15		
 171275	Core:16		
 171276	Core:17		
 171277	Core:18		
 171278	Core:19		
 171279	Core:20		
 171281	NUMANode:0		
 171282	NUMANode:0		
171230	NUMANode:1		gmx_mpi
 171230	Core:21		
 171239	NUMANode:1		
 171240	NUMANode:1		
 171283	Core:22		
 171285	Core:23		
 171287	Core:24		
 171289	Core:25		
 171291	Core:26		
 171293	Core:27		


More information about the gromacs.org_gmx-developers mailing list