[gmx-users] High load imbalance: 31.8%

Szilárd Páll pall.szilard at gmail.com
Thu Aug 20 19:25:38 CEST 2015


Hi Anthony,


Good choice, your admins should be able to help. Do emphasize when talking
to them that you need your job to be placed as "compact" as possible, on
the closest possible set of cores with the tightest possible affinity
settings.

Cheers,

--
Szilárd

On Thu, Aug 20, 2015 at 6:12 PM, Nash, Anthony <a.nash at ucl.ac.uk> wrote:

> Hi Szilárd
>
> Thanks for all of that advice. I’m going to have to take a lot of this up
> with the Cluster Service Staff. This is a new cluster service I won a
> grant for, thus not my usual platform which would typically yield an
> imbalance of somewhere around 0.8% to 2%.
>
> Thanks again
> Anthony
>
>
>
> On 20/08/2015 16:52, "Szilárd Páll" <pall.szilard at gmail.com> wrote:
>
> >Hi,
> >
> >You're not pinning threads and it seems that you're running on a large SMP
> >machine! Assuming that the 512 threads reported (line 91) is correct
> >that's
> >a 32 socket SMP machine, perhaps an SGI UV? In any case Xeon E5-4xxx is
> >typically deployed in 4-8 socket installations, so your 8 threads will be
> >floating around on a number of CPUs which ruins your performance - and
> >likely contributes to the varying and large load imbalance.
> >
> >My advice:
> >- don't ignore notes/warnings issued by mdrun (line 366, should be on the
> >standard out too), we put quite some though into spamming users only when
> >relevant :)
> >- pin mdrun and/or its threads either with "-pin on" (and -pinoffset if
> >needed) or with whatever tools your admins provide/recommend
> >
> >[Extras: consider using FFTW even with the Intel compilers it's often
> >faster for our small FFTs than MKL; and GNU iso Intel compiler is often
> >faster too.]
> >
> >Fixing the above issues should not only reduce imbalance but most likely
> >also allow you to gain quite some simulation performance! Let us know if
> >it
> >worked.
> >
> >Cheers,
> >
> >--
> >Szilárd
> >
> >On Thu, Aug 20, 2015 at 5:08 PM, Nash, Anthony <a.nash at ucl.ac.uk> wrote:
> >
> >> Hi Mark,
> >>
> >> Many thanks for looking into this.
> >>
> >> One of the log files (the job hasn’t finished running) is here:
> >> https://www.dropbox.com/s/zwrro54yni2uxtn/umb_3_umb.log?dl=0
> >>
> >> The system is a soluble collagenase in water with a collagen substrate
> >>and
> >> two zinc co-factors. There are 287562 atoms in the system.
> >>
> >> Please let me know if you need to know anything else. Thanks!
> >>
> >> Anthony
> >>
> >>
> >>
> >>
> >>
> >> On 20/08/2015 11:39, "Mark Abraham" <mark.j.abraham at gmail.com> wrote:
> >>
> >> >Hi,
> >> >
> >> >In cases like this, it's good to describe what's in your simulation,
> >>and
> >> >share the full .log file on a file-sharing service, so we can see both
> >>the
> >> >things mdrun reports early and late.
> >> >
> >> >Mark
> >> >
> >> >On Thu, Aug 20, 2015 at 8:22 AM Nash, Anthony <a.nash at ucl.ac.uk>
> wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> I appear to have a very high load imbalance on some of my runs.
> >>Values
> >> >> starting from approx. 7% up to 31.8% with reported vol min/aver of
> >> >>around
> >> >> 0.6 (I haven¹t found one under half yet).
> >> >>
> >> >> When I look through the .log file at the start of the run I see:
> >> >>
> >> >> Initializing Domain Decomposition on 8 ranks
> >> >> Dynamic load balancing: auto
> >> >> Will sort the charge groups at every domain (re)decomposition
> >> >> Initial maximum inter charge-group distances:
> >> >>     two-body bonded interactions: 0.514 nm, LJ-14, atoms 3116 3123
> >> >>   multi-body bonded interactions: 0.429 nm, Proper Dih., atoms 3116
> >>3123
> >> >> Minimum cell size due to bonded interactions: 0.472 nm
> >> >> Maximum distance for 5 constraints, at 120 deg. angles, all-trans:
> >> >>0.862 nm
> >> >> Estimated maximum distance required for P-LINCS: 0.862 nm
> >> >> This distance will limit the DD cell size, you can override this with
> >> >>-rcon
> >> >> Using 0 separate PME ranks, as there are too few total
> >> >>  ranks for efficient splitting
> >> >> Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
> >> >> Optimizing the DD grid for 8 cells with a minimum initial size of
> >>1.077
> >> >>nm
> >> >> The maximum allowed number of cells is: X 12 Y 12 Z 12
> >> >> Domain decomposition grid 4 x 2 x 1, separate PME ranks 0
> >> >> PME domain decomposition: 4 x 2 x 1
> >> >> Domain decomposition rank 0, coordinates 0 0 0
> >> >> Using 8 MPI processes
> >> >> Using 1 OpenMP thread per MPI process
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Having a quick look through the documentation and I see that I should
> >> >> consider implementing the verlet cut-off (which I am) and adjust the
> >> >> number of PME nodes/cut-off and PME grid spacing. Would this simply
> >>be a
> >> >> case of throwing more cores at the simulation or must I play around
> >>with
> >> >> P-LINCS parameters?
> >> >>
> >> >> Thanks
> >> >> Anthony
> >> >>
> >> >> --
> >> >> Gromacs Users mailing list
> >> >>
> >> >> * Please search the archive at
> >> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> >> posting!
> >> >>
> >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >>
> >> >> * For (un)subscribe requests visit
> >> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >> >> send a mail to gmx-users-request at gromacs.org.
> >> >>
> >> >--
> >> >Gromacs Users mailing list
> >> >
> >> >* Please search the archive at
> >> >http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> >posting!
> >> >
> >> >* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >
> >> >* For (un)subscribe requests visit
> >> >https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> >send a mail to gmx-users-request at gromacs.org.
> >>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> >>
> >--
> >Gromacs Users mailing list
> >
> >* Please search the archive at
> >http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >posting!
> >
> >* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> >* For (un)subscribe requests visit
> >https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list