[gmx-users] Computational load of Constraints/COM pull force

Kenny Goossens goossens_kenny at hotmail.com
Tue Oct 9 16:01:33 CEST 2018


Hi,


In the meanwhile, I have done several tests to fine-tune the domain decomposition manually with great results. However, I noticed an aspect that seemed really interesting to me (although I might be overlooking something really obvious). Whenever I change the -dlb option to "no" or "auto", the allocation of computing power seems very reasonable. However, when I force dlb to be on for the entire run, I get a seemingly unreasonable spike in  computing time to calculate COM pulling force, while the load imbalance decreases relative to the other runs. Again, here are three log files using the exact same input and command line options, aside from the dlb flag.


dlb off: https://ufile.io/7bjcq

dlb auto: https://ufile.io/4qlc6

dlb on: https://ufile.io/n9rme


I hope this makes some sense to someone!


Kind regards,


Kenneth

________________________________
Van: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> namens Szilárd Páll <pall.szilard at gmail.com>
Verzonden: vrijdag 28 september 2018 23:43
Aan: Discussion list for GROMACS users
Onderwerp: Re: [gmx-users] Computational load of Constraints/COM pull force

Hi,

The issue you are running seems to be caused by the significant load
imbalance in the simulation that sometimes throws the load balancing off --
it's something I've seen before (and I thought that we solved it). The
system is "tall" and most likely has a significant inhomogeneity along Z.
mdrun decomposes more along Z (also because it tries hard to match the PP
and PME grids) which amplifies the imbalance: there will be lots of
solvent-only domains off-center with far lower computational cost than the
middle domains.

The technical reason for the performance discrepancy is however an artifact
of the load balancer not managing to back off and stop balancing to avoid
performance deterioration. We have tried to prevent such cases by stopping
DLB if we measure performance degradation which you can see happening when
the following status messages are emitted in the log:
"Turning off dynamic load balancing, because it is degrading performance."

You can also see in the "Dynamic load balancer report" that in one of your
umbrellaN logs DLB is left off, while in the other it is turned on at the
end of the run.

Note that the large amount of time logged under "COM pull force" (as well
as the PP X/F communication) in the run which reports to have balancing on
is in fact also caused by load imbalance: ranks are waiting for each other
to complete the communication.

Here are a few tips to mitigate the issue:
- Try to tune the decomposition / launch config to reduce the load
imbalance:
  * Use more threads per rank, e.g. with 2-3 OpenMP threads/rank you'll
need 2-3x fewer domains on the same amount of hardware;
  * Consider tweaking the decomposition grids, e.g. manually specify -dd
(and possibly -npme)
- Do not force DLB on (in one of you previous logs I saw that); If the
issue persists, do turn DLB off manually
# Consider setting affinity using mdrun (-pin on), in your current runs you
let the Intel runtime do it (see the log note "Non-default thread affinity
set probably by the OpenMP library, disabling internal thread affinity")

Last, we should try to see if we can tune DLB to not miss cases like yours;
could you please file an issue on redmine.gromacs.org with your two
umbrella logs; if you can share the system that would help reproducing.

Cheers,
--
Szilárd


On Fri, Sep 28, 2018 at 5:45 PM Kenny Goossens <goossens_kenny at hotmail.com>
wrote:

> Hi,
>
>
> For completed runs, I refer to umbrella2.log/umbrella3.log. The ligand in
> these frames are only separated by 0.1 nm, so I don't seehow this could
> meaningfully impact the computational cost.
>
> Regarding the DD issues, these happen almost always after I parallellize
> the simulation over more than one node. I should mention that I am
> simulating an enzyme in a lipid bilayer, so this is most likely the cause.
> the load balancing issues are also present regardless of the performance.
>
>
> Kind regards,
>
>
> Kenneth
>
> ________________________________
> Van: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> namens Mark Abraham <
> mark.j.abraham at gmail.com>
> Verzonden: vrijdag 28 september 2018 16:30
> Aan: gmx-users at gromacs.org
> Onderwerp: Re: [gmx-users] Computational load of Constraints/COM pull force
>
> Hi,
>
> That looks like the shape/size of the simulation box has changed such that
> the domain decomposition load balancing is having a differently hard time.
> Do you have some runs that completed normally? The performance report from
> the bottom of the log file is missing from the files you shared.
>
> Mark
>
> On Fri, Sep 28, 2018 at 3:50 PM Kenny Goossens <goossens_kenny at hotmail.com
> >
> wrote:
>
> > Hi Mark
> >
> >
> > Here are two (intended as) equivalent runs:
> >
> > https://uploadfiles.io/wtlu1
[https://uploadfiles.io/assets/img/welcome/1.gif]<https://uploadfiles.io/wtlu1>

Uploadfiles.io - test4.log<https://uploadfiles.io/wtlu1>
uploadfiles.io
Upload files, for free, securely, anonymously, without limits. @UploadFilesFree


> [https://uploadfiles.io/assets/img/welcome/1.gif]<
> https://uploadfiles.io/wtlu1>
>
> Uploadfiles.io - test4.log<https://uploadfiles.io/wtlu1>
> uploadfiles.io
> Upload files, for free, securely, anonymously, without limits.
> @UploadFilesFree
>
>
> >
> > <https://uploadfiles.io/wtlu1>https://ufile.io/gmf03
> >
> >
> > For the disparity of 20ns/day to 90ns/day, the runs were actually
> > subsequent umbrella sampling frames with the same input commands:<
> > https://uploadfiles.io/wtlu1>
> >
> >
> > <https://uploadfiles.io/wtlu1>https://ufile.io/zfae3
> >
> > <https://uploadfiles.io/wtlu1><https://ufile.io/zfae3>
> > https://ufile.io/sjq4o
> >
> >
> > <https://uploadfiles.io/wtlu1>Kind regards,<https://ufile.io/zfae3>
> >
> >
> > <https://uploadfiles.io/wtlu1>Kenneth<https://ufile.io/zfae3>
> >
> > <https://uploadfiles.io/wtlu1>
> >
> > ________________________________
> > Van: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> namens Mark Abraham <
> > mark.j.abraham at gmail.com>
> > Verzonden: vrijdag 28 september 2018 15:08
> > Aan: gmx-users at gromacs.org
> > CC: gromacs.org_gmx-users at maillist.sys.kth.se
> > Onderwerp: Re: [gmx-users] Computational load of Constraints/COM pull
> force
> >
> > Hi,
> >
> > Are you able to upload the log files from two intended-as-equivalent runs
> > to a file-sharing service and share the link? We can comment better with
> > that data.
> >
> > Mark
> >
> > On Fri, Sep 28, 2018 at 2:46 PM Kenny Goossens <
> goossens_kenny at hotmail.com
> > >
> > wrote:
> >
> > > Dear all,
> > >
> > >
> > > I am performing umbrella sampling simulations on gromacs 2018.3, and
> > > because I need to sample every frame for a long time, I am trying to
> > > optimize the settings I'm using to get the maximum performance from my
> > > cluster. However, whenever I try running benchmarks, I notice that the
> > > relative amount of computing time that it takes to calculate
> constraints
> > > and COM pull force vary wildly (within a range of 2-30%). As you can
> > > imagine, this has a dramatic impact on the performance, as my
> performance
> > > for identical runs can fluctuate between 20-90 ns per day. I'm not sure
> > if
> > > this is a general problem, or if this is caused by something I'm doing
> > > wrong. Is anyone able to help me out with this? Thank you!
> > >
> > >
> > > Kind regards,
> > >
> > >
> > > Kenneth
> > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > GMX-Users List - Gromacs<
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List>
> > www.gromacs.org<http://www.gromacs.org>
> > This is the main discussion forum for users of GROMACS and related
> > software. Subscribe, unsubscribe or change your preferences through
> > gmx-users Admin page. It is important that you subscribe with your mail
> > address exactly as it appears in the headers of the mails you send,
> > otherwise your posts will be rejected!E.g.
> >
> >
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list