[gmx-users] High load imbalance: 31.8%

Szilárd Páll pall.szilard at gmail.com
Thu Aug 20 17:52:33 CEST 2015


Hi,

You're not pinning threads and it seems that you're running on a large SMP
machine! Assuming that the 512 threads reported (line 91) is correct that's
a 32 socket SMP machine, perhaps an SGI UV? In any case Xeon E5-4xxx is
typically deployed in 4-8 socket installations, so your 8 threads will be
floating around on a number of CPUs which ruins your performance - and
likely contributes to the varying and large load imbalance.

My advice:
- don't ignore notes/warnings issued by mdrun (line 366, should be on the
standard out too), we put quite some though into spamming users only when
relevant :)
- pin mdrun and/or its threads either with "-pin on" (and -pinoffset if
needed) or with whatever tools your admins provide/recommend

[Extras: consider using FFTW even with the Intel compilers it's often
faster for our small FFTs than MKL; and GNU iso Intel compiler is often
faster too.]

Fixing the above issues should not only reduce imbalance but most likely
also allow you to gain quite some simulation performance! Let us know if it
worked.

Cheers,

--
Szilárd

On Thu, Aug 20, 2015 at 5:08 PM, Nash, Anthony <a.nash at ucl.ac.uk> wrote:

> Hi Mark,
>
> Many thanks for looking into this.
>
> One of the log files (the job hasn’t finished running) is here:
> https://www.dropbox.com/s/zwrro54yni2uxtn/umb_3_umb.log?dl=0
>
> The system is a soluble collagenase in water with a collagen substrate and
> two zinc co-factors. There are 287562 atoms in the system.
>
> Please let me know if you need to know anything else. Thanks!
>
> Anthony
>
>
>
>
>
> On 20/08/2015 11:39, "Mark Abraham" <mark.j.abraham at gmail.com> wrote:
>
> >Hi,
> >
> >In cases like this, it's good to describe what's in your simulation, and
> >share the full .log file on a file-sharing service, so we can see both the
> >things mdrun reports early and late.
> >
> >Mark
> >
> >On Thu, Aug 20, 2015 at 8:22 AM Nash, Anthony <a.nash at ucl.ac.uk> wrote:
> >
> >> Hi all,
> >>
> >> I appear to have a very high load imbalance on some of my runs. Values
> >> starting from approx. 7% up to 31.8% with reported vol min/aver of
> >>around
> >> 0.6 (I haven¹t found one under half yet).
> >>
> >> When I look through the .log file at the start of the run I see:
> >>
> >> Initializing Domain Decomposition on 8 ranks
> >> Dynamic load balancing: auto
> >> Will sort the charge groups at every domain (re)decomposition
> >> Initial maximum inter charge-group distances:
> >>     two-body bonded interactions: 0.514 nm, LJ-14, atoms 3116 3123
> >>   multi-body bonded interactions: 0.429 nm, Proper Dih., atoms 3116 3123
> >> Minimum cell size due to bonded interactions: 0.472 nm
> >> Maximum distance for 5 constraints, at 120 deg. angles, all-trans:
> >>0.862 nm
> >> Estimated maximum distance required for P-LINCS: 0.862 nm
> >> This distance will limit the DD cell size, you can override this with
> >>-rcon
> >> Using 0 separate PME ranks, as there are too few total
> >>  ranks for efficient splitting
> >> Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
> >> Optimizing the DD grid for 8 cells with a minimum initial size of 1.077
> >>nm
> >> The maximum allowed number of cells is: X 12 Y 12 Z 12
> >> Domain decomposition grid 4 x 2 x 1, separate PME ranks 0
> >> PME domain decomposition: 4 x 2 x 1
> >> Domain decomposition rank 0, coordinates 0 0 0
> >> Using 8 MPI processes
> >> Using 1 OpenMP thread per MPI process
> >>
> >>
> >>
> >>
> >> Having a quick look through the documentation and I see that I should
> >> consider implementing the verlet cut-off (which I am) and adjust the
> >> number of PME nodes/cut-off and PME grid spacing. Would this simply be a
> >> case of throwing more cores at the simulation or must I play around with
> >> P-LINCS parameters?
> >>
> >> Thanks
> >> Anthony
> >>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> >>
> >--
> >Gromacs Users mailing list
> >
> >* Please search the archive at
> >http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >posting!
> >
> >* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> >* For (un)subscribe requests visit
> >https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list