[gmx-users] cpu/gpu utilization
Mahmood Naderan
nt_mahmood at yahoo.com
Fri Mar 2 11:01:23 CET 2018
Command is "gmx mdrun -nobackup -pme cpu -nb gpu -deffnm md_0_1" and the log says
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 16 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 16 501 0.972 55.965 0.8
Launch GPU ops. 1 16 50001 2.141 123.301 1.7
Force 1 16 50001 4.019 231.486 3.1
PME mesh 1 16 50001 40.695 2344.171 31.8
Wait GPU NB local 1 16 50001 60.155 3465.079 47.0
NB X/F buffer ops. 1 16 99501 7.342 422.902 5.7
Write traj. 1 16 11 0.246 14.184 0.2
Update 1 16 50001 3.480 200.461 2.7
Constraints 1 16 50001 5.831 335.878 4.6
Rest 3.159 181.963 2.5
-----------------------------------------------------------------------------
Total 128.039 7375.390 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME spread 1 16 50001 17.086 984.209 13.3
PME gather 1 16 50001 12.534 722.007 9.8
PME 3D-FFT 1 16 100002 9.956 573.512 7.8
PME solve Elec 1 16 50001 0.779 44.859 0.6
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 2048.617 128.039 1600.0
(ns/day) (hour/ns)
Performance: 67.481 0.356
While the command is "", I see that the gpu is utilized about 10% and the log file says:
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 16 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 16 1251 6.912 398.128 2.3
Force 1 16 50001 210.689 12135.653 70.4
PME mesh 1 16 50001 46.869 2699.656 15.7
NB X/F buffer ops. 1 16 98751 22.315 1285.360 7.5
Write traj. 1 16 11 0.216 12.447 0.1
Update 1 16 50001 4.382 252.386 1.5
Constraints 1 16 50001 6.035 347.601 2.0
Rest 1.666 95.933 0.6
-----------------------------------------------------------------------------
Total 299.083 17227.165 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME spread 1 16 50001 21.505 1238.693 7.2
PME gather 1 16 50001 12.089 696.333 4.0
PME 3D-FFT 1 16 100002 11.627 669.705 3.9
PME solve Elec 1 16 50001 0.965 55.598 0.3
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 4785.326 299.083 1600.0
(ns/day) (hour/ns)
Performance: 28.889 0.831
Using GPU is still better than using CPU alone. However, I see that while GPU is utilized, the CPU is also busy. So, I was thinking that the source code uses cudaDeviceSynchronize() where the CPU enters a busy loop.
Regards,
Mahmood
On Friday, March 2, 2018, 11:37:11 AM GMT+3:30, Magnus Lundborg <magnus.lundborg at scilifelab.se> wrote:
Have you tried the mdrun options:
-pme cpu -nb gpu
-pme cpu -nb cpu
Cheers,
Magnus
More information about the gromacs.org_gmx-users
mailing list