[gmx-users] cpu/gpu utilization

Szilárd Páll pall.szilard at gmail.com
Fri Mar 2 12:54:53 CET 2018


Once again, full log files, please, not partial cut-and-paste, please.

Also, you misread something because your previous logs show:
-nb cpu -pme gpu: 56.4 ns/day
-nb cpu -pme gpu -pmefft cpu 64.6 ns/day
-nb cpu -pme cpu 67.5 ns/day

So both mixed mode PME and PME on CPU are faster, the latter slightly
faster than the former.

This is about as much as you can do, I think. Your GPU is just too slow to
get more performance out of it and the runs are GPU-bound. You might be
able to get a bit more performance with some tweaks (compile mdrun with
AVX2_256, use a newer fftw, use a newer gcc), but expect marginal gains.

Cheers,

--
Szilárd

On Fri, Mar 2, 2018 at 11:00 AM, Mahmood Naderan <nt_mahmood at yahoo.com>
wrote:

> Command is "gmx mdrun -nobackup -pme cpu -nb gpu -deffnm md_0_1" and the
> log says
>
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
> On 1 MPI rank, each using 16 OpenMP threads
>
>  Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                      Ranks Threads  Count      (s)         total sum    %
> ------------------------------------------------------------
> -----------------
>  Neighbor search        1   16        501       0.972         55.965   0.8
>  Launch GPU ops.        1   16      50001       2.141        123.301   1.7
>  Force                  1   16      50001       4.019        231.486   3.1
>  PME mesh               1   16      50001      40.695       2344.171  31.8
>  Wait GPU NB local      1   16      50001      60.155       3465.079  47.0
>  NB X/F buffer ops.     1   16      99501       7.342        422.902   5.7
>  Write traj.            1   16         11       0.246         14.184   0.2
>  Update                 1   16      50001       3.480        200.461   2.7
>  Constraints            1   16      50001       5.831        335.878   4.6
>  Rest                                           3.159        181.963   2.5
> ------------------------------------------------------------
> -----------------
>  Total                                        128.039       7375.390 100.0
> ------------------------------------------------------------
> -----------------
>  Breakdown of PME mesh computation
> ------------------------------------------------------------
> -----------------
>  PME spread             1   16      50001      17.086        984.209  13.3
>  PME gather             1   16      50001      12.534        722.007   9.8
>  PME 3D-FFT             1   16     100002       9.956        573.512   7.8
>  PME solve Elec         1   16      50001       0.779         44.859   0.6
> ------------------------------------------------------------
> -----------------
>
>                Core t (s)   Wall t (s)        (%)
>        Time:     2048.617      128.039     1600.0
>                  (ns/day)    (hour/ns)
> Performance:       67.481        0.356
>
>
>
>
>
>
> While the command is "", I see that the gpu is utilized about 10% and the
> log file says:
>
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
> On 1 MPI rank, each using 16 OpenMP threads
>
>  Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                      Ranks Threads  Count      (s)         total sum    %
> ------------------------------------------------------------
> -----------------
>  Neighbor search        1   16       1251       6.912        398.128   2.3
>  Force                  1   16      50001     210.689      12135.653  70.4
>  PME mesh               1   16      50001      46.869       2699.656  15.7
>  NB X/F buffer ops.     1   16      98751      22.315       1285.360   7.5
>  Write traj.            1   16         11       0.216         12.447   0.1
>  Update                 1   16      50001       4.382        252.386   1.5
>  Constraints            1   16      50001       6.035        347.601   2.0
>  Rest                                           1.666         95.933   0.6
> ------------------------------------------------------------
> -----------------
>  Total                                        299.083      17227.165 100.0
> ------------------------------------------------------------
> -----------------
>  Breakdown of PME mesh computation
> ------------------------------------------------------------
> -----------------
>  PME spread             1   16      50001      21.505       1238.693   7.2
>  PME gather             1   16      50001      12.089        696.333   4.0
>  PME 3D-FFT             1   16     100002      11.627        669.705   3.9
>  PME solve Elec         1   16      50001       0.965         55.598   0.3
> ------------------------------------------------------------
> -----------------
>
>                Core t (s)   Wall t (s)        (%)
>        Time:     4785.326      299.083     1600.0
>                  (ns/day)    (hour/ns)
> Performance:       28.889        0.831
>
>
>
>
> Using GPU is still better than using CPU alone. However, I see that while
> GPU is utilized, the CPU is also busy. So, I was thinking that the source
> code uses cudaDeviceSynchronize() where the CPU enters a busy loop.
>
> Regards,
> Mahmood
>
>     On Friday, March 2, 2018, 11:37:11 AM GMT+3:30, Magnus Lundborg <
> magnus.lundborg at scilifelab.se> wrote:
>
>  Have you tried the mdrun options:
>
> -pme cpu -nb gpu
> -pme cpu -nb cpu
>
> Cheers,
>
> Magnus
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list