[gmx-users] Loosing partly the available CPU time

Szilárd Páll pall.szilard at gmail.com
Tue Aug 16 14:42:29 CEST 2016


Shouldn't have taken 16 hours to assess that (and >5x slowdown is
suspicious, btw). You'd be better off taking the time to try do a few
short benchmark runs first to find the best settings and use those to
run your production later.
--
Szilárd


On Tue, Aug 16, 2016 at 10:16 AM, Alexander Alexander
<alexanderwien2k at gmail.com> wrote:
> Hi Szila
>
> I tried more more OpenMP threads, (-ntomp 4), however, the performance
> dropped down drastically so that an NVT simulation which just took 3 hours
> to be finished in "-ntomp 1", now takes more thank 16 hours!
>
> Cheers,
> Alex
>
> On Mon, Aug 15, 2016 at 6:34 PM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>
>> Hi,
>>
>> Although I don't know what exactly is the system you are simulating,
>> one thing is clear: you're pushing the parallelization limit with
>> - 200 atoms/core
>> - likely "concentrated" free energy interactions.
>> The former that alone will make the run very sensitive to load
>> imbalance and the latter makes imbalance even worse as the very
>> expensive free energy interactions likely all fall in a few domains
>> (unless your 8 perturbed atoms are scattered).
>>
>> There is not much you can do except the what I previously suggested
>> (trying more OpenMP threads e.g. 2-4 or simply use less cores). If you
>> have the possibility, using some hardware with fewer and faster cores
>> (and perhaps a GPU) will also be much more suitable than this 128-core
>> AMD node.
>>
>> Cheers,
>> --
>> Szilárd
>>
>>
>> On Mon, Aug 15, 2016 at 4:01 PM, Alexander Alexander
>> <alexanderwien2k at gmail.com> wrote:
>> > Hi Szilárd,
>> >
>> > Thanks for your response, please find below a link containing required
>> > files.log files.
>> >
>> > https://drive.google.com/file/d/0B_CbyhnbKqQDc2FaeWxITWxqdDg
>> /view?usp=sharing
>> >
>> > Thanks,
>> > Cheers,
>> > Alex
>> >
>> > On Mon, Aug 15, 2016 at 2:52 PM, Szilárd Páll <pall.szilard at gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> Please post full logs; what you cut out of the file will often miss
>> >> information needed to diagnose your issues.
>> >>
>> >> At first sight it seems that you simply have an imbalanced system. Not
>> >> sure about the source of the imbalance and without knowing more about
>> >> your system/setup and how is it decomposed what I can suggest is to:
>> >> try other decomposition schemes or simply less decomposition (use more
>> >> threads or less cores).
>> >>
>> >> Additionally you also have a pretty bad PP-PME load balance, but
>> >> that's likely going to get better if you get you PP performance
>> >> better.
>> >>
>> >> Cheers,
>> >> --
>> >> Szilárd
>> >>
>> >>
>> >> On Sun, Aug 14, 2016 at 3:23 PM, Alexander Alexander
>> >> <alexanderwien2k at gmail.com> wrote:
>> >> > Dear gromacs user,
>> >> >
>> >> > My free energy calculation works well, however, I am loosing around
>> 56.5
>> >> %
>> >> > of the available CPU time as stated in my log file which is really
>> >> > considerable. The problem is due to the load imbalance and domain
>> >> > decomposition, but I have no idea to improve it, below is the very
>> end of
>> >> > my log file and I would be so appreciated if you could help avoid
>> this.
>> >> >
>> >> >
>> >> >    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>> >> >
>> >> >  av. #atoms communicated per step for force:  2 x 115357.4
>> >> >  av. #atoms communicated per step for LINCS:  2 x 2389.1
>> >> >
>> >> >  Average load imbalance: 285.9 %
>> >> >  Part of the total run time spent waiting due to load imbalance: 56.5
>> %
>> >> >  Steps where the load balancing was limited by -rdd, -rcon and/or
>> -dds:
>> >> X 2
>> >> > % Y 2 % Z 2 %
>> >> >  Average PME mesh/force load: 0.384
>> >> >  Part of the total run time spent waiting due to PP/PME imbalance:
>> 14.5 %
>> >> >
>> >> > NOTE: 56.5 % of the available CPU time was lost due to load imbalance
>> >> >       in the domain decomposition.
>> >> >
>> >> > NOTE: 14.5 % performance was lost because the PME ranks
>> >> >       had less work to do than the PP ranks.
>> >> >       You might want to decrease the number of PME ranks
>> >> >       or decrease the cut-off and the grid spacing.
>> >> >
>> >> >
>> >> >      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>> >> >
>> >> > On 96 MPI ranks doing PP, and
>> >> > on 32 MPI ranks doing PME
>> >> >
>> >> >  Computing:          Num   Num      Call    Wall time
>>  Giga-Cycles
>> >> >                      Ranks Threads  Count      (s)         total sum
>>   %
>> >> > ------------------------------------------------------------
>> >> -----------------
>> >> >  Domain decomp.        96    1     175000     242.339      53508.472
>> >>  0.5
>> >> >  DD comm. load         96    1     174903       9.076       2003.907
>> >>  0.0
>> >> >  DD comm. bounds       96    1     174901      27.054       5973.491
>> >>  0.1
>> >> >  Send X to PME         96    1    7000001      44.342       9790.652
>> >>  0.1
>> >> >  Neighbor search       96    1     175001     251.994      55640.264
>> >>  0.6
>> >> >  Comm. coord.          96    1    6825000    1521.009     335838.747
>> >>  3.4
>> >> >  Force                 96    1    7000001    7001.990    1546039.264
>> >> 15.5
>> >> >  Wait + Comm. F        96    1    7000001   10761.296    2376093.759
>> >> 23.8
>> >> >  PME mesh *            32    1    7000001   11796.344     868210.788
>> >>  8.7
>> >> >  PME wait for PP *                          22135.752    1629191.096
>> >> 16.3
>> >> >  Wait + Recv. PME F    96    1    7000001     393.117      86800.265
>> >>  0.9
>> >> >  NB X/F buffer ops.    96    1   20650001     132.713      29302.991
>> >>  0.3
>> >> >  COM pull force        96    1    7000001     165.613      36567.368
>> >>  0.4
>> >> >  Write traj.           96    1       7037      55.020      12148.457
>> >>  0.1
>> >> >  Update                96    1   14000002     140.972      31126.607
>> >>  0.3
>> >> >  Constraints           96    1   14000002   12871.236    2841968.551
>> >> 28.4
>> >> >  Comm. energies        96    1     350001     261.976      57844.219
>> >>  0.6
>> >> >  Rest                                          52.349      11558.715
>> >>  0.1
>> >> > ------------------------------------------------------------
>> >> -----------------
>> >> >  Total                                      33932.096    9989607.639
>> >> 100.0
>> >> > ------------------------------------------------------------
>> >> -----------------
>> >> > (*) Note that with separate PME ranks, the walltime column actually
>> sums
>> >> to
>> >> >     twice the total reported, but the cycle count total and % are
>> >> correct.
>> >> > ------------------------------------------------------------
>> >> -----------------
>> >> >  Breakdown of PME mesh computation
>> >> > ------------------------------------------------------------
>> >> -----------------
>> >> >  PME redist. X/F       32    1   21000003    2334.608     171827.143
>> >>  1.7
>> >> >  PME spread/gather     32    1   28000004    3640.870     267967.972
>> >>  2.7
>> >> >  PME 3D-FFT            32    1   28000004    1587.105     116810.882
>> >>  1.2
>> >> >  PME 3D-FFT Comm.      32    1   56000008    4066.097     299264.666
>> >>  3.0
>> >> >  PME solve Elec        32    1   14000002     148.284      10913.728
>> >>  0.1
>> >> > ------------------------------------------------------------
>> >> -----------------
>> >> >
>> >> >                Core t (s)   Wall t (s)        (%)
>> >> >        Time:  4341204.790    33932.096    12793.8
>> >> >                          9h25:32
>> >> >                  (ns/day)    (hour/ns)
>> >> > Performance:       35.648        0.673
>> >> > Finished mdrun on rank 0 Sat Aug 13 23:45:45 2016
>> >> >
>> >> > Thanks,
>> >> > Regards,
>> >> > Alex
>> >> > --
>> >> > Gromacs Users mailing list
>> >> >
>> >> > * Please search the archive at http://www.gromacs.org/
>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> >
>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >
>> >> > * For (un)subscribe requests visit
>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>> >> --
>> >> Gromacs Users mailing list
>> >>
>> >> * Please search the archive at http://www.gromacs.org/
>> >> Support/Mailing_Lists/GMX-Users_List before posting!
>> >>
>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> >> * For (un)subscribe requests visit
>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at http://www.gromacs.org/Support
>> /Mailing_Lists/GMX-Users_List before posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support
>> /Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list