[gmx-developers] Re: load balancing

hessb at mpip-mainz.mpg.de hessb at mpip-mainz.mpg.de
Fri Feb 13 18:41:07 CET 2009


Hi,

For all systems I have tried the load imbalance quickly drops
below 2% and in most cases even below 1%, but I never used more
than 1000 cores.
But I vaguely remember one user that has load imbalance problems.

The load balancing is optimized simply by counting the cpu cycles
in between the coordinate and force communication
(the cycles are given by the function dd_force_load in domdec.c).
The cell boundaries are then scaled with an underrelaxation of 0.5.
You can simply print these cycle count just before set_dd_cell_sizes_dlb.

If you have more than 6000 cores it could happen that by concidence
nearly always (a different) one of these cores is doing something
else (system stuff). This would kill the performance of a program
such as Gromacs which has extremely short step times and need synchronized
communication.

Berk

> Hi Berk,
>
> is there a way to get more information about the load balancing? Did you
> use
> some (profiling) tools to optimize it? I would like to see why the load
> balancing is not able to balance the load very efficiently for very high
> core numbers (>6000 without PME). I would like to see information as:
> - which nodes are taking longer
> - which subroutines are to blame for that
> - is the load imbalance stable over time (or is it in each step different
> nodes that have to much load)
>
> I tried craypat and tau but so far were not really able to get very
> meaningful results.
>
> Roland
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> This email was Anti Virus checked by Astaro Security Gateway.
> http://www.astaro.com





More information about the gromacs.org_gmx-developers mailing list