[gmx-users] Loosing partly the available CPU time

Szilárd Páll pall.szilard at gmail.com
Mon Aug 15 14:52:30 CEST 2016


Hi,

Please post full logs; what you cut out of the file will often miss
information needed to diagnose your issues.

At first sight it seems that you simply have an imbalanced system. Not
sure about the source of the imbalance and without knowing more about
your system/setup and how is it decomposed what I can suggest is to:
try other decomposition schemes or simply less decomposition (use more
threads or less cores).

Additionally you also have a pretty bad PP-PME load balance, but
that's likely going to get better if you get you PP performance
better.

Cheers,
--
Szilárd


On Sun, Aug 14, 2016 at 3:23 PM, Alexander Alexander
<alexanderwien2k at gmail.com> wrote:
> Dear gromacs user,
>
> My free energy calculation works well, however, I am loosing around 56.5 %
> of the available CPU time as stated in my log file which is really
> considerable. The problem is due to the load imbalance and domain
> decomposition, but I have no idea to improve it, below is the very end of
> my log file and I would be so appreciated if you could help avoid this.
>
>
>    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>
>  av. #atoms communicated per step for force:  2 x 115357.4
>  av. #atoms communicated per step for LINCS:  2 x 2389.1
>
>  Average load imbalance: 285.9 %
>  Part of the total run time spent waiting due to load imbalance: 56.5 %
>  Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 2
> % Y 2 % Z 2 %
>  Average PME mesh/force load: 0.384
>  Part of the total run time spent waiting due to PP/PME imbalance: 14.5 %
>
> NOTE: 56.5 % of the available CPU time was lost due to load imbalance
>       in the domain decomposition.
>
> NOTE: 14.5 % performance was lost because the PME ranks
>       had less work to do than the PP ranks.
>       You might want to decrease the number of PME ranks
>       or decrease the cut-off and the grid spacing.
>
>
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
> On 96 MPI ranks doing PP, and
> on 32 MPI ranks doing PME
>
>  Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                      Ranks Threads  Count      (s)         total sum    %
> -----------------------------------------------------------------------------
>  Domain decomp.        96    1     175000     242.339      53508.472   0.5
>  DD comm. load         96    1     174903       9.076       2003.907   0.0
>  DD comm. bounds       96    1     174901      27.054       5973.491   0.1
>  Send X to PME         96    1    7000001      44.342       9790.652   0.1
>  Neighbor search       96    1     175001     251.994      55640.264   0.6
>  Comm. coord.          96    1    6825000    1521.009     335838.747   3.4
>  Force                 96    1    7000001    7001.990    1546039.264  15.5
>  Wait + Comm. F        96    1    7000001   10761.296    2376093.759  23.8
>  PME mesh *            32    1    7000001   11796.344     868210.788   8.7
>  PME wait for PP *                          22135.752    1629191.096  16.3
>  Wait + Recv. PME F    96    1    7000001     393.117      86800.265   0.9
>  NB X/F buffer ops.    96    1   20650001     132.713      29302.991   0.3
>  COM pull force        96    1    7000001     165.613      36567.368   0.4
>  Write traj.           96    1       7037      55.020      12148.457   0.1
>  Update                96    1   14000002     140.972      31126.607   0.3
>  Constraints           96    1   14000002   12871.236    2841968.551  28.4
>  Comm. energies        96    1     350001     261.976      57844.219   0.6
>  Rest                                          52.349      11558.715   0.1
> -----------------------------------------------------------------------------
>  Total                                      33932.096    9989607.639 100.0
> -----------------------------------------------------------------------------
> (*) Note that with separate PME ranks, the walltime column actually sums to
>     twice the total reported, but the cycle count total and % are correct.
> -----------------------------------------------------------------------------
>  Breakdown of PME mesh computation
> -----------------------------------------------------------------------------
>  PME redist. X/F       32    1   21000003    2334.608     171827.143   1.7
>  PME spread/gather     32    1   28000004    3640.870     267967.972   2.7
>  PME 3D-FFT            32    1   28000004    1587.105     116810.882   1.2
>  PME 3D-FFT Comm.      32    1   56000008    4066.097     299264.666   3.0
>  PME solve Elec        32    1   14000002     148.284      10913.728   0.1
> -----------------------------------------------------------------------------
>
>                Core t (s)   Wall t (s)        (%)
>        Time:  4341204.790    33932.096    12793.8
>                          9h25:32
>                  (ns/day)    (hour/ns)
> Performance:       35.648        0.673
> Finished mdrun on rank 0 Sat Aug 13 23:45:45 2016
>
> Thanks,
> Regards,
> Alex
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list