[gmx-users] High load imbalance: 31.8%

Thu Aug 20 17:56:23 CEST 2015

On Thu, Aug 20, 2015 at 5:52 PM, Szilárd Páll <pall.szilard at gmail.com>
wrote:

> Hi,
>
> You're not pinning threads and it seems that you're running on a large SMP
> machine! Assuming that the 512 threads reported (line 91) is correct that's
> a 32 socket SMP machine, perhaps an SGI UV? In any case Xeon E5-4xxx is
> typically deployed in 4-8 socket installations,
>

Correction: I confused the E5-46xx with the E7 series. These are 2-4
socket, it seems. In any case, the 512 threads reported still suggests a
large SMP machine.

> so your 8 threads will be floating around on a number of CPUs which ruins
> your performance - and likely contributes to the varying and large load
> imbalance.
>
> My advice:
> - don't ignore notes/warnings issued by mdrun (line 366, should be on the
> standard out too), we put quite some though into spamming users only when
> relevant :)
> - pin mdrun and/or its threads either with "-pin on" (and -pinoffset if
> needed) or with whatever tools your admins provide/recommend
>
> [Extras: consider using FFTW even with the Intel compilers it's often
> faster for our small FFTs than MKL; and GNU iso Intel compiler is often
> faster too.]
>
> Fixing the above issues should not only reduce imbalance but most likely
> also allow you to gain quite some simulation performance! Let us know if it
> worked.
>
> Cheers,
>
> --
> Szilárd
>
> On Thu, Aug 20, 2015 at 5:08 PM, Nash, Anthony <a.nash at ucl.ac.uk> wrote:
>
>> Hi Mark,
>>
>> Many thanks for looking into this.
>>
>> One of the log files (the job hasn’t finished running) is here:
>> https://www.dropbox.com/s/zwrro54yni2uxtn/umb_3_umb.log?dl=0
>>
>> The system is a soluble collagenase in water with a collagen substrate and
>> two zinc co-factors. There are 287562 atoms in the system.
>>
>> Please let me know if you need to know anything else. Thanks!
>>
>> Anthony
>>
>>
>>
>>
>>
>> On 20/08/2015 11:39, "Mark Abraham" <mark.j.abraham at gmail.com> wrote:
>>
>> >Hi,
>> >
>> >In cases like this, it's good to describe what's in your simulation, and
>> >share the full .log file on a file-sharing service, so we can see both
>> the
>> >things mdrun reports early and late.
>> >
>> >Mark
>> >
>> >On Thu, Aug 20, 2015 at 8:22 AM Nash, Anthony <a.nash at ucl.ac.uk> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I appear to have a very high load imbalance on some of my runs. Values
>> >> starting from approx. 7% up to 31.8% with reported vol min/aver of
>> >>around
>> >> 0.6 (I haven¹t found one under half yet).
>> >>
>> >> When I look through the .log file at the start of the run I see:
>> >>
>> >> Initializing Domain Decomposition on 8 ranks
>> >> Dynamic load balancing: auto
>> >> Will sort the charge groups at every domain (re)decomposition
>> >> Initial maximum inter charge-group distances:
>> >>     two-body bonded interactions: 0.514 nm, LJ-14, atoms 3116 3123
>> >>   multi-body bonded interactions: 0.429 nm, Proper Dih., atoms 3116
>> 3123
>> >> Minimum cell size due to bonded interactions: 0.472 nm
>> >> Maximum distance for 5 constraints, at 120 deg. angles, all-trans:
>> >>0.862 nm
>> >> Estimated maximum distance required for P-LINCS: 0.862 nm
>> >> This distance will limit the DD cell size, you can override this with
>> >>-rcon
>> >> Using 0 separate PME ranks, as there are too few total
>> >>  ranks for efficient splitting
>> >> Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
>> >> Optimizing the DD grid for 8 cells with a minimum initial size of 1.077
>> >>nm
>> >> The maximum allowed number of cells is: X 12 Y 12 Z 12
>> >> Domain decomposition grid 4 x 2 x 1, separate PME ranks 0
>> >> PME domain decomposition: 4 x 2 x 1
>> >> Domain decomposition rank 0, coordinates 0 0 0
>> >> Using 8 MPI processes
>> >> Using 1 OpenMP thread per MPI process
>> >>
>> >>
>> >>
>> >>
>> >> Having a quick look through the documentation and I see that I should
>> >> consider implementing the verlet cut-off (which I am) and adjust the
>> >> number of PME nodes/cut-off and PME grid spacing. Would this simply be
>> a
>> >> case of throwing more cores at the simulation or must I play around
>> with
>> >> P-LINCS parameters?
>> >>
>> >> Thanks
>> >> Anthony
>> >>
>> >> --
>> >> Gromacs Users mailing list
>> >>
>> >> * Please search the archive at
>> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >> posting!
>> >>
>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> >> * For (un)subscribe requests visit
>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>> >>
>> >--
>> >Gromacs Users mailing list
>> >
>> >* Please search the archive at
>> >http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >posting!
>> >
>> >* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> >* For (un)subscribe requests visit
>> >https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >send a mail to gmx-users-request at gromacs.org.
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>
>