[gmx-users] Loosing partly the available CPU time

Alexander Alexander alexanderwien2k at gmail.com
Sun Aug 14 15:24:29 CEST 2016

Dear gromacs user,

My free energy calculation works well, however, I am loosing around 56.5 %
of the available CPU time as stated in my log file which is really
considerable. The problem is due to the load imbalance and domain
decomposition, but I have no idea to improve it, below is the very end of
my log file and I would be so appreciated if you could help avoid this.

   D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 115357.4
 av. #atoms communicated per step for LINCS:  2 x 2389.1

 Average load imbalance: 285.9 %
 Part of the total run time spent waiting due to load imbalance: 56.5 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 2
% Y 2 % Z 2 %
 Average PME mesh/force load: 0.384
 Part of the total run time spent waiting due to PP/PME imbalance: 14.5 %

NOTE: 56.5 % of the available CPU time was lost due to load imbalance
      in the domain decomposition.

NOTE: 14.5 % performance was lost because the PME ranks
      had less work to do than the PP ranks.
      You might want to decrease the number of PME ranks
      or decrease the cut-off and the grid spacing.

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 96 MPI ranks doing PP, and
on 32 MPI ranks doing PME

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
 Domain decomp.        96    1     175000     242.339      53508.472   0.5
 DD comm. load         96    1     174903       9.076       2003.907   0.0
 DD comm. bounds       96    1     174901      27.054       5973.491   0.1
 Send X to PME         96    1    7000001      44.342       9790.652   0.1
 Neighbor search       96    1     175001     251.994      55640.264   0.6
 Comm. coord.          96    1    6825000    1521.009     335838.747   3.4
 Force                 96    1    7000001    7001.990    1546039.264  15.5
 Wait + Comm. F        96    1    7000001   10761.296    2376093.759  23.8
 PME mesh *            32    1    7000001   11796.344     868210.788   8.7
 PME wait for PP *                          22135.752    1629191.096  16.3
 Wait + Recv. PME F    96    1    7000001     393.117      86800.265   0.9
 NB X/F buffer ops.    96    1   20650001     132.713      29302.991   0.3
 COM pull force        96    1    7000001     165.613      36567.368   0.4
 Write traj.           96    1       7037      55.020      12148.457   0.1
 Update                96    1   14000002     140.972      31126.607   0.3
 Constraints           96    1   14000002   12871.236    2841968.551  28.4
 Comm. energies        96    1     350001     261.976      57844.219   0.6
 Rest                                          52.349      11558.715   0.1
 Total                                      33932.096    9989607.639 100.0
(*) Note that with separate PME ranks, the walltime column actually sums to
    twice the total reported, but the cycle count total and % are correct.
 Breakdown of PME mesh computation
 PME redist. X/F       32    1   21000003    2334.608     171827.143   1.7
 PME spread/gather     32    1   28000004    3640.870     267967.972   2.7
 PME 3D-FFT            32    1   28000004    1587.105     116810.882   1.2
 PME 3D-FFT Comm.      32    1   56000008    4066.097     299264.666   3.0
 PME solve Elec        32    1   14000002     148.284      10913.728   0.1

               Core t (s)   Wall t (s)        (%)
       Time:  4341204.790    33932.096    12793.8
                 (ns/day)    (hour/ns)
Performance:       35.648        0.673
Finished mdrun on rank 0 Sat Aug 13 23:45:45 2016


