[gmx-users] Loosing partly the available CPU time
Alexander Alexander
alexanderwien2k at gmail.com
Mon Aug 15 16:01:53 CEST 2016
Hi Szilárd,
Thanks for your response, please find below a link containing required
files.log files.
https://drive.google.com/file/d/0B_CbyhnbKqQDc2FaeWxITWxqdDg/view?usp=sharing
Thanks,
Cheers,
Alex
On Mon, Aug 15, 2016 at 2:52 PM, Szilárd Páll <pall.szilard at gmail.com>
wrote:
> Hi,
>
> Please post full logs; what you cut out of the file will often miss
> information needed to diagnose your issues.
>
> At first sight it seems that you simply have an imbalanced system. Not
> sure about the source of the imbalance and without knowing more about
> your system/setup and how is it decomposed what I can suggest is to:
> try other decomposition schemes or simply less decomposition (use more
> threads or less cores).
>
> Additionally you also have a pretty bad PP-PME load balance, but
> that's likely going to get better if you get you PP performance
> better.
>
> Cheers,
> --
> Szilárd
>
>
> On Sun, Aug 14, 2016 at 3:23 PM, Alexander Alexander
> <alexanderwien2k at gmail.com> wrote:
> > Dear gromacs user,
> >
> > My free energy calculation works well, however, I am loosing around 56.5
> %
> > of the available CPU time as stated in my log file which is really
> > considerable. The problem is due to the load imbalance and domain
> > decomposition, but I have no idea to improve it, below is the very end of
> > my log file and I would be so appreciated if you could help avoid this.
> >
> >
> > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
> >
> > av. #atoms communicated per step for force: 2 x 115357.4
> > av. #atoms communicated per step for LINCS: 2 x 2389.1
> >
> > Average load imbalance: 285.9 %
> > Part of the total run time spent waiting due to load imbalance: 56.5 %
> > Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
> X 2
> > % Y 2 % Z 2 %
> > Average PME mesh/force load: 0.384
> > Part of the total run time spent waiting due to PP/PME imbalance: 14.5 %
> >
> > NOTE: 56.5 % of the available CPU time was lost due to load imbalance
> > in the domain decomposition.
> >
> > NOTE: 14.5 % performance was lost because the PME ranks
> > had less work to do than the PP ranks.
> > You might want to decrease the number of PME ranks
> > or decrease the cut-off and the grid spacing.
> >
> >
> > R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >
> > On 96 MPI ranks doing PP, and
> > on 32 MPI ranks doing PME
> >
> > Computing: Num Num Call Wall time Giga-Cycles
> > Ranks Threads Count (s) total sum %
> > ------------------------------------------------------------
> -----------------
> > Domain decomp. 96 1 175000 242.339 53508.472
> 0.5
> > DD comm. load 96 1 174903 9.076 2003.907
> 0.0
> > DD comm. bounds 96 1 174901 27.054 5973.491
> 0.1
> > Send X to PME 96 1 7000001 44.342 9790.652
> 0.1
> > Neighbor search 96 1 175001 251.994 55640.264
> 0.6
> > Comm. coord. 96 1 6825000 1521.009 335838.747
> 3.4
> > Force 96 1 7000001 7001.990 1546039.264
> 15.5
> > Wait + Comm. F 96 1 7000001 10761.296 2376093.759
> 23.8
> > PME mesh * 32 1 7000001 11796.344 868210.788
> 8.7
> > PME wait for PP * 22135.752 1629191.096
> 16.3
> > Wait + Recv. PME F 96 1 7000001 393.117 86800.265
> 0.9
> > NB X/F buffer ops. 96 1 20650001 132.713 29302.991
> 0.3
> > COM pull force 96 1 7000001 165.613 36567.368
> 0.4
> > Write traj. 96 1 7037 55.020 12148.457
> 0.1
> > Update 96 1 14000002 140.972 31126.607
> 0.3
> > Constraints 96 1 14000002 12871.236 2841968.551
> 28.4
> > Comm. energies 96 1 350001 261.976 57844.219
> 0.6
> > Rest 52.349 11558.715
> 0.1
> > ------------------------------------------------------------
> -----------------
> > Total 33932.096 9989607.639
> 100.0
> > ------------------------------------------------------------
> -----------------
> > (*) Note that with separate PME ranks, the walltime column actually sums
> to
> > twice the total reported, but the cycle count total and % are
> correct.
> > ------------------------------------------------------------
> -----------------
> > Breakdown of PME mesh computation
> > ------------------------------------------------------------
> -----------------
> > PME redist. X/F 32 1 21000003 2334.608 171827.143
> 1.7
> > PME spread/gather 32 1 28000004 3640.870 267967.972
> 2.7
> > PME 3D-FFT 32 1 28000004 1587.105 116810.882
> 1.2
> > PME 3D-FFT Comm. 32 1 56000008 4066.097 299264.666
> 3.0
> > PME solve Elec 32 1 14000002 148.284 10913.728
> 0.1
> > ------------------------------------------------------------
> -----------------
> >
> > Core t (s) Wall t (s) (%)
> > Time: 4341204.790 33932.096 12793.8
> > 9h25:32
> > (ns/day) (hour/ns)
> > Performance: 35.648 0.673
> > Finished mdrun on rank 0 Sat Aug 13 23:45:45 2016
> >
> > Thanks,
> > Regards,
> > Alex
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list