[gmx-users] Computational load of Constraints/COM pull force

Kenny Goossens goossens_kenny at hotmail.com
Thu Oct 11 09:43:15 CEST 2018


Hi Szilárd,


Thank you for the clarification, from the last response I assumed that whenever COM pull force computation increased, the load imbalance would always increase along with it. However, this makes me wonder how the load imbalance parameter in the log file is calculated if it does not include this imbalance?


To answer your remarks:

-I did do tests for 5 nodes as well in the previous set up, and for 12 nodes in the current set up. In both cases, a straight up comparison improved the performance by over 50% in both cases. If these would be of any use to you, I'd be glad to also upload those.

-In my previous set of tests, I allowed the PME grid to be set by -tunepme and kept it at that value for these tests. I'm not sure how to properly adjust the PME grid manually, is there any rule of thumb for this? I've tried looking around, but haven't really found anything useful on this topic. The only thing I encountered (in the manual) is that the accuracy remains the same when the grid is scaled with the short-range cut off, but I'm not sure against what reference this scaling is taken.

-My bad, here is the 'dlb on' log file: https://ufile.io/p7nx8

Thank you!


Kind regards,


Kenneth



________________________________
Van: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> namens Szilárd Páll <pall.szilard at gmail.com>
Verzonden: dinsdag 9 oktober 2018 17:54
Aan: Discussion list for GROMACS users
Onderwerp: Re: [gmx-users] Computational load of Constraints/COM pull force

Hi,

I believe the answers to your questions were already in my previous mail,
but perhaps not clear enough, so let me try to clarify.

What you see as an increase in time assigned to "COM pull force" in the
timings table is in fact the sign of imabalance. "COM pull force"
communicates across ranks and when DLB shifts load, it creased imbalance
from computation that precedes the pulling which leads to some ranks
participating in the communication arrive early to the pull communication
while others arrive late the former having to wait for the latter.

I would not recommend to ever force "-dlb on", and as you saw even "-dlb
auto" can cause issues in some cases (we're working on improving that).
Whenever you see messages like this:
"Turning off dynamic load balancing, because it is degrading performance."
in the log, that's a sign that DLB can not equalize load and should be
probably kept off (or that there are transient disturbances on the cluster
that lead to computational imbalance that can not be balanced out).

Some remarks:
- your new logs are not from the same amount of nodes/cores as the previous
ones, so I'd say it's hard to judge the improvements;
- you're using somewhat strange settings: 1.4 nm cutoff is very long, but
you don't scale the PME grid along, so your spacing is ~40% finer than
needed (which means unnecessarily high PME load);
- BTW you shared two "-db no" runs and one with "-dlb auto".

Cheers,
--
Szilárd


On Tue, Oct 9, 2018 at 4:02 PM Kenny Goossens <goossens_kenny at hotmail.com>
wrote:

> Hi,
>
>
> In the meanwhile, I have done several tests to fine-tune the domain
> decomposition manually with great results. However, I noticed an aspect
> that seemed really interesting to me (although I might be overlooking
> something really obvious). Whenever I change the -dlb option to "no" or
> "auto", the allocation of computing power seems very reasonable. However,
> when I force dlb to be on for the entire run, I get a seemingly
> unreasonable spike in  computing time to calculate COM pulling force, while
> the load imbalance decreases relative to the other runs. Again, here are
> three log files using the exact same input and command line options, aside
> from the dlb flag.
>
>
> dlb off: https://ufile.io/7bjcq
>
> dlb auto: https://ufile.io/4qlc6
>
> dlb on: https://ufile.io/n9rme
>
>
> I hope this makes some sense to someone!
>
>
> Kind regards,
>
>
> Kenneth
>
> ________________________________
> Van: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> namens Szilárd Páll <
> pall.szilard at gmail.com>
> Verzonden: vrijdag 28 september 2018 23:43
> Aan: Discussion list for GROMACS users
> Onderwerp: Re: [gmx-users] Computational load of Constraints/COM pull force
>
> Hi,
>
> The issue you are running seems to be caused by the significant load
> imbalance in the simulation that sometimes throws the load balancing off --
> it's something I've seen before (and I thought that we solved it). The
> system is "tall" and most likely has a significant inhomogeneity along Z.
> mdrun decomposes more along Z (also because it tries hard to match the PP
> and PME grids) which amplifies the imbalance: there will be lots of
> solvent-only domains off-center with far lower computational cost than the
> middle domains.
>
> The technical reason for the performance discrepancy is however an artifact
> of the load balancer not managing to back off and stop balancing to avoid
> performance deterioration. We have tried to prevent such cases by stopping
> DLB if we measure performance degradation which you can see happening when
> the following status messages are emitted in the log:
> "Turning off dynamic load balancing, because it is degrading performance."
>
> You can also see in the "Dynamic load balancer report" that in one of your
> umbrellaN logs DLB is left off, while in the other it is turned on at the
> end of the run.
>
> Note that the large amount of time logged under "COM pull force" (as well
> as the PP X/F communication) in the run which reports to have balancing on
> is in fact also caused by load imbalance: ranks are waiting for each other
> to complete the communication.
>
> Here are a few tips to mitigate the issue:
> - Try to tune the decomposition / launch config to reduce the load
> imbalance:
>   * Use more threads per rank, e.g. with 2-3 OpenMP threads/rank you'll
> need 2-3x fewer domains on the same amount of hardware;
>   * Consider tweaking the decomposition grids, e.g. manually specify -dd
> (and possibly -npme)
> - Do not force DLB on (in one of you previous logs I saw that); If the
> issue persists, do turn DLB off manually
> # Consider setting affinity using mdrun (-pin on), in your current runs you
> let the Intel runtime do it (see the log note "Non-default thread affinity
> set probably by the OpenMP library, disabling internal thread affinity")
>
> Last, we should try to see if we can tune DLB to not miss cases like yours;
> could you please file an issue on redmine.gromacs.org with your two
> umbrella logs; if you can share the system that would help reproducing.
>
> Cheers,
> --
> Szilárd
>
>
> On Fri, Sep 28, 2018 at 5:45 PM Kenny Goossens <goossens_kenny at hotmail.com
> >
> wrote:
>
> > Hi,
> >
> >
> > For completed runs, I refer to umbrella2.log/umbrella3.log. The ligand in
> > these frames are only separated by 0.1 nm, so I don't seehow this could
> > meaningfully impact the computational cost.
> >
> > Regarding the DD issues, these happen almost always after I parallellize
> > the simulation over more than one node. I should mention that I am
> > simulating an enzyme in a lipid bilayer, so this is most likely the
> cause.
> > the load balancing issues are also present regardless of the performance.
> >
> >
> > Kind regards,
> >
> >
> > Kenneth
> >
> > ________________________________
> > Van: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> namens Mark Abraham <
> > mark.j.abraham at gmail.com>
> > Verzonden: vrijdag 28 september 2018 16:30
> > Aan: gmx-users at gromacs.org
> > Onderwerp: Re: [gmx-users] Computational load of Constraints/COM pull
> force
> >
> > Hi,
> >
> > That looks like the shape/size of the simulation box has changed such
> that
> > the domain decomposition load balancing is having a differently hard
> time.
> > Do you have some runs that completed normally? The performance report
> from
> > the bottom of the log file is missing from the files you shared.
> >
> > Mark
> >
> > On Fri, Sep 28, 2018 at 3:50 PM Kenny Goossens <
> goossens_kenny at hotmail.com
> > >
> > wrote:
> >
> > > Hi Mark
> > >
> > >
> > > Here are two (intended as) equivalent runs:
> > >
> > > https://uploadfiles.io/wtlu1
> [https://uploadfiles.io/assets/img/welcome/1.gif]<
> https://uploadfiles.io/wtlu1>
>
> Uploadfiles.io - test4.log<https://uploadfiles.io/wtlu1>
> uploadfiles.io
> Upload files, for free, securely, anonymously, without limits.
> @UploadFilesFree
>
>
> > [https://uploadfiles.io/assets/img/welcome/1.gif]<
> > https://uploadfiles.io/wtlu1>
> >
> > Uploadfiles.io - test4.log<https://uploadfiles.io/wtlu1>
> > uploadfiles.io
> > Upload files, for free, securely, anonymously, without limits.
> > @UploadFilesFree
> >
> >
> > >
> > > <https://uploadfiles.io/wtlu1>https://ufile.io/gmf03
> > >
> > >
> > > For the disparity of 20ns/day to 90ns/day, the runs were actually
> > > subsequent umbrella sampling frames with the same input commands:<
> > > https://uploadfiles.io/wtlu1>
> > >
> > >
> > > <https://uploadfiles.io/wtlu1>https://ufile.io/zfae3
> > >
> > > <https://uploadfiles.io/wtlu1><https://ufile.io/zfae3>
> > > https://ufile.io/sjq4o
> > >
> > >
> > > <https://uploadfiles.io/wtlu1>Kind regards,<https://ufile.io/zfae3>
> > >
> > >
> > > <https://uploadfiles.io/wtlu1>Kenneth<https://ufile.io/zfae3>
> > >
> > > <https://uploadfiles.io/wtlu1>
> > >
> > > ________________________________
> > > Van: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> namens Mark
> Abraham <
> > > mark.j.abraham at gmail.com>
> > > Verzonden: vrijdag 28 september 2018 15:08
> > > Aan: gmx-users at gromacs.org
> > > CC: gromacs.org_gmx-users at maillist.sys.kth.se
> > > Onderwerp: Re: [gmx-users] Computational load of Constraints/COM pull
> > force
> > >
> > > Hi,
> > >
> > > Are you able to upload the log files from two intended-as-equivalent
> runs
> > > to a file-sharing service and share the link? We can comment better
> with
> > > that data.
> > >
> > > Mark
> > >
> > > On Fri, Sep 28, 2018 at 2:46 PM Kenny Goossens <
> > goossens_kenny at hotmail.com
> > > >
> > > wrote:
> > >
> > > > Dear all,
> > > >
> > > >
> > > > I am performing umbrella sampling simulations on gromacs 2018.3, and
> > > > because I need to sample every frame for a long time, I am trying to
> > > > optimize the settings I'm using to get the maximum performance from
> my
> > > > cluster. However, whenever I try running benchmarks, I notice that
> the
> > > > relative amount of computing time that it takes to calculate
> > constraints
> > > > and COM pull force vary wildly (within a range of 2-30%). As you can
> > > > imagine, this has a dramatic impact on the performance, as my
> > performance
> > > > for identical runs can fluctuate between 20-90 ns per day. I'm not
> sure
> > > if
> > > > this is a general problem, or if this is caused by something I'm
> doing
> > > > wrong. Is anyone able to help me out with this? Thank you!
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > >
> > > > Kenneth
> > > >
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > GMX-Users List - Gromacs<
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List>
> > > www.gromacs.org<http://www.gromacs.org>
> > > This is the main discussion forum for users of GROMACS and related
> > > software. Subscribe, unsubscribe or change your preferences through
> > > gmx-users Admin page. It is important that you subscribe with your mail
> > > address exactly as it appears in the headers of the mails you send,
> > > otherwise your posts will be rejected!E.g.
> > >
> > >
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list