[gmx-developers] Gromacs with GPU
Åke Sandgren
ake.sandgren at hpc2n.umu.se
Fri Sep 22 13:10:33 CEST 2017
Hi!
I am seeing a possible performance enhancement (possibly) when running
gromacs on nodes with multiple gpu cards.
(And yes I know this is perhaps a mote point since current GPU cards
don't have dual engines per card)
System:
dual socket 14-core broadwell cpus
2 K80 cards, one on each socket.
Gromacs built with hwloc support.
When running a dual node (56 core)
gmx_mpi mdrun -npme 4 -s ion_channel_bench00.tpr -resetstep 20000 -o
bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g
bench.log -ntomp 7 -pin on -dlb yes
job, (slurm + cgroups), gromacs doesn't fully take hwloc info into
account. The job correctly gets allocated on cores, but looking at
nvidia-smi and hwloc-ps i can see that the PP processes are using a
suboptimal selection of GPU engines.
The PP processes are placed one on each CPU socket (according to which
process-ids are using the GPUs and the position of those pids according
to hwloc-ps), but they both uses gpu engines from the same (first) K80 card.
It would be better to have looked at the hwloc info and selected CUDA
devices 0,2 (or 1,3) instead of 0,1.
Any comments on that?
Attached nvidia-smi + hwloc-ps output
--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
-------------- next part --------------
root at b-cn1302:~# nvidia-smi
Fri Sep 22 12:59:57 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:0D:00.0 Off | 0 |
| N/A 39C P0 127W / 149W | 75MiB / 11439MiB | 63% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 0000:0E:00.0 Off | 0 |
| N/A 49C P0 145W / 149W | 76MiB / 11439MiB | 64% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 0000:88:00.0 Off | 0 |
| N/A 27C P8 26W / 149W | 2MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 0000:89:00.0 Off | 0 |
| N/A 32C P8 29W / 149W | 2MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 171227 C gmx_mpi 73MiB |
| 1 171229 C gmx_mpi 74MiB |
+-----------------------------------------------------------------------------+
hwloc-ps -t -l output:
171227 NUMANode:0 gmx_mpi
171227 Core:0
171233 NUMANode:0
171235 NUMANode:0
171261 NUMANode:0
171268 Core:1
171269 Core:2
171270 Core:3
171271 Core:4
171272 Core:5
171273 Core:6
171280 NUMANode:0
171228 Core:7 Core:8 Core:9 Core:10 Core:11 Core:12 Core:13 NUMANode:1 gmx_mpi
171228 Core:7
171237 NUMANode:1
171238 NUMANode:1
171284 Core:8
171286 Core:9
171288 Core:10
171290 Core:11
171292 Core:12
171294 Core:13
171229 NUMANode:0 Core:14 Core:15 Core:16 Core:17 Core:18 Core:19 Core:20 gmx_mpi
171229 Core:14
171234 NUMANode:0
171236 NUMANode:0
171274 Core:15
171275 Core:16
171276 Core:17
171277 Core:18
171278 Core:19
171279 Core:20
171281 NUMANode:0
171282 NUMANode:0
171230 NUMANode:1 gmx_mpi
171230 Core:21
171239 NUMANode:1
171240 NUMANode:1
171283 Core:22
171285 Core:23
171287 Core:24
171289 Core:25
171291 Core:26
171293 Core:27
More information about the gromacs.org_gmx-developers
mailing list