[gmx-users] Loosing partly the available CPU time

Alexander Alexander alexanderwien2k at gmail.com
Mon Aug 15 16:01:53 CEST 2016


Hi Szilárd,

Thanks for your response, please find below a link containing required
files.log files.

https://drive.google.com/file/d/0B_CbyhnbKqQDc2FaeWxITWxqdDg/view?usp=sharing

Thanks,
Cheers,
Alex

On Mon, Aug 15, 2016 at 2:52 PM, Szilárd Páll <pall.szilard at gmail.com>
wrote:

> Hi,
>
> Please post full logs; what you cut out of the file will often miss
> information needed to diagnose your issues.
>
> At first sight it seems that you simply have an imbalanced system. Not
> sure about the source of the imbalance and without knowing more about
> your system/setup and how is it decomposed what I can suggest is to:
> try other decomposition schemes or simply less decomposition (use more
> threads or less cores).
>
> Additionally you also have a pretty bad PP-PME load balance, but
> that's likely going to get better if you get you PP performance
> better.
>
> Cheers,
> --
> Szilárd
>
>
> On Sun, Aug 14, 2016 at 3:23 PM, Alexander Alexander
> <alexanderwien2k at gmail.com> wrote:
> > Dear gromacs user,
> >
> > My free energy calculation works well, however, I am loosing around 56.5
> %
> > of the available CPU time as stated in my log file which is really
> > considerable. The problem is due to the load imbalance and domain
> > decomposition, but I have no idea to improve it, below is the very end of
> > my log file and I would be so appreciated if you could help avoid this.
> >
> >
> >    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
> >
> >  av. #atoms communicated per step for force:  2 x 115357.4
> >  av. #atoms communicated per step for LINCS:  2 x 2389.1
> >
> >  Average load imbalance: 285.9 %
> >  Part of the total run time spent waiting due to load imbalance: 56.5 %
> >  Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
> X 2
> > % Y 2 % Z 2 %
> >  Average PME mesh/force load: 0.384
> >  Part of the total run time spent waiting due to PP/PME imbalance: 14.5 %
> >
> > NOTE: 56.5 % of the available CPU time was lost due to load imbalance
> >       in the domain decomposition.
> >
> > NOTE: 14.5 % performance was lost because the PME ranks
> >       had less work to do than the PP ranks.
> >       You might want to decrease the number of PME ranks
> >       or decrease the cut-off and the grid spacing.
> >
> >
> >      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> >
> > On 96 MPI ranks doing PP, and
> > on 32 MPI ranks doing PME
> >
> >  Computing:          Num   Num      Call    Wall time         Giga-Cycles
> >                      Ranks Threads  Count      (s)         total sum    %
> > ------------------------------------------------------------
> -----------------
> >  Domain decomp.        96    1     175000     242.339      53508.472
>  0.5
> >  DD comm. load         96    1     174903       9.076       2003.907
>  0.0
> >  DD comm. bounds       96    1     174901      27.054       5973.491
>  0.1
> >  Send X to PME         96    1    7000001      44.342       9790.652
>  0.1
> >  Neighbor search       96    1     175001     251.994      55640.264
>  0.6
> >  Comm. coord.          96    1    6825000    1521.009     335838.747
>  3.4
> >  Force                 96    1    7000001    7001.990    1546039.264
> 15.5
> >  Wait + Comm. F        96    1    7000001   10761.296    2376093.759
> 23.8
> >  PME mesh *            32    1    7000001   11796.344     868210.788
>  8.7
> >  PME wait for PP *                          22135.752    1629191.096
> 16.3
> >  Wait + Recv. PME F    96    1    7000001     393.117      86800.265
>  0.9
> >  NB X/F buffer ops.    96    1   20650001     132.713      29302.991
>  0.3
> >  COM pull force        96    1    7000001     165.613      36567.368
>  0.4
> >  Write traj.           96    1       7037      55.020      12148.457
>  0.1
> >  Update                96    1   14000002     140.972      31126.607
>  0.3
> >  Constraints           96    1   14000002   12871.236    2841968.551
> 28.4
> >  Comm. energies        96    1     350001     261.976      57844.219
>  0.6
> >  Rest                                          52.349      11558.715
>  0.1
> > ------------------------------------------------------------
> -----------------
> >  Total                                      33932.096    9989607.639
> 100.0
> > ------------------------------------------------------------
> -----------------
> > (*) Note that with separate PME ranks, the walltime column actually sums
> to
> >     twice the total reported, but the cycle count total and % are
> correct.
> > ------------------------------------------------------------
> -----------------
> >  Breakdown of PME mesh computation
> > ------------------------------------------------------------
> -----------------
> >  PME redist. X/F       32    1   21000003    2334.608     171827.143
>  1.7
> >  PME spread/gather     32    1   28000004    3640.870     267967.972
>  2.7
> >  PME 3D-FFT            32    1   28000004    1587.105     116810.882
>  1.2
> >  PME 3D-FFT Comm.      32    1   56000008    4066.097     299264.666
>  3.0
> >  PME solve Elec        32    1   14000002     148.284      10913.728
>  0.1
> > ------------------------------------------------------------
> -----------------
> >
> >                Core t (s)   Wall t (s)        (%)
> >        Time:  4341204.790    33932.096    12793.8
> >                          9h25:32
> >                  (ns/day)    (hour/ns)
> > Performance:       35.648        0.673
> > Finished mdrun on rank 0 Sat Aug 13 23:45:45 2016
> >
> > Thanks,
> > Regards,
> > Alex
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list