[gmx-users] Poor load balancing

Carsten Kutzner ckutzne at gwdg.de
Tue Feb 16 16:58:05 CET 2010


Deniz,

for calculations with PME you might want to use the g_tune_pme
tools that helps to find the optimal settings on a given number of
cores. For Gromacs 4.0.x you can download it from

http://www.mpibpc.mpg.de/home/grubmueller/projects/MethodAdvancements/Gromacs/

You find installation instructions on the top of the g_tune_pme.c
file.

Carsten


On Feb 16, 2010, at 1:41 PM, Deniz KARASU wrote:

> Carsten thank you for your response. 
> 
> I did same benchmark with 8 node and 16 node . But these experiments were done with PME instead of cutt-off. To optimize  I changed cutt-of and fourier spacing.  I wonder this results are acceptable and if need more optimization.
> 
> Thanks.
> 
> Deniz
> 
> ====================================================
> 
> 8 node and cutt-of = 0.9 nm fourier_spacing=0.12
> 
>  Average load imbalance: 4.0 %
>  Part of the total run time spent waiting due to load imbalance: 1.4 %
>  Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
>  Average PME mesh/force load: 1.758
>  Part of the total run time spent waiting due to PP/PME imbalance: 15.7 %
> 
> NOTE: 15.7 % performance was lost because the PME nodes
>       had more work to do than the PP nodes.
>       You might want to increase the number of PME nodes
>       or increase the cut-off and the grid spacing.
> 
> 
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
>  Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
>  Domain decomp.         4       1001       36.253       15.5     1.4
>  Vsite constr.          4       5001        3.237        1.4     0.1
>  Send X to PME          4       5001       10.365        4.4     0.4
>  Comm. coord.           4       5001       15.193        6.5     0.6
>  Neighbor search        4       1001      279.944      120.0    10.8
>  Force                  4       5001      451.185      193.5    17.4
>  Wait + Comm. F         4       5001       63.147       27.1     2.4
>  PME mesh               4       5001      940.073      403.1    36.3
>  Wait + Comm. X/F       4       5001      356.494      152.9    13.7
>  Wait + Recv. PME F     4       5001      345.820      148.3    13.3
>  Vsite spread           4      10002        6.568        2.8     0.3
>  Write traj.            4          1        0.350        0.2     0.0
>  Update                 4       5001       20.525        8.8     0.8
>  Constraints            4       5001       42.245       18.1     1.6
>  Comm. energies         4       5001        3.377        1.4     0.1
>  Rest                   4                  18.393        7.9     0.7
> -----------------------------------------------------------------------
>  Total                  8                2593.170     1112.0   100.0
> -----------------------------------------------------------------------
> 
>     Parallel run - timing based on wallclock.
> 
>                NODE (s)   Real (s)      (%)
>        Time:    139.000    139.000    100.0
>                        2:19
>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    127.854      9.458     12.434      1.930
> Finished mdrun on node 0 Mon Feb 15 17:34:48 2010
> 
> ====================================================
> 8 node cut-off = 1.0 nm and fourier_spacing=0.13
> 
>  Average load imbalance: 3.4 %
>  Part of the total run time spent waiting due to load imbalance: 1.7 %
>  Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
>  Average PME mesh/force load: 1.129
>  Part of the total run time spent waiting due to PP/PME imbalance: 3.7 %
> 
> 
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
>  Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
>  Domain decomp.         4       1001       35.777       15.3     1.5
>  Vsite constr.          4       5001        2.620        1.1     0.1
>  Send X to PME          4       5001       10.182        4.4     0.4
>  Comm. coord.           4       5001       15.727        6.7     0.7
>  Neighbor search        4       1001      275.561      117.9    11.8
>  Force                  4       5001      576.720      246.7    24.7
>  Wait + Comm. F         4       5001       69.631       29.8     3.0
>  PME mesh               4       5001      752.485      321.8    32.2
>  Wait + Comm. X/F       4       5001      416.550      178.2    17.8
>  Wait + Recv. PME F     4       5001       91.857       39.3     3.9
>  Vsite spread           4      10002        6.456        2.8     0.3
>  Write traj.            4          1        0.426        0.2     0.0
>  Update                 4       5001       20.577        8.8     0.9
>  Constraints            4       5001       41.959       17.9     1.8
>  Comm. energies         4       5001        2.967        1.3     0.1
>  Rest                   4                  18.612        8.0     0.8
> -----------------------------------------------------------------------
>  Total                  8                2338.108     1000.0   100.0
> -----------------------------------------------------------------------
> 
>     Parallel run - timing based on wallclock.
> 
>                NODE (s)   Real (s)      (%)
>        Time:    125.000    125.000    100.0
>                        2:05
>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    190.198     11.789     13.827      1.736
> Finished mdrun on node 0 Mon Feb 15 22:10:46 2010
> 
> ====================================================
> 8 node cut-off = 1.1 nm, fourier_spacing=0.135 
> 
>  Average load imbalance: 0.7 %
>  Part of the total run time spent waiting due to load imbalance: 0.4 %
>  Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
>  Average PME mesh/force load: 0.872
>  Part of the total run time spent waiting due to PP/PME imbalance: 4.2 %
> 
> 
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
>  Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
>  Domain decomp.         4       1001       30.117       12.9     1.3
>  Vsite constr.          4       5001        1.739        0.7     0.1
>  Send X to PME          4       5001        9.944        4.3     0.4
>  Comm. coord.           4       5001       16.964        7.3     0.7
>  Neighbor search        4       1001      269.553      115.8    11.4
>  Force                  4       5001      708.179      304.2    29.9
>  Wait + Comm. F         4       5001       50.572       21.7     2.1
>  PME mesh               4       5001      671.310      288.3    28.4
>  Wait + Comm. X/F       4       5001      511.451      219.7    21.6
>  Wait + Recv. PME F     4       5001       10.333        4.4     0.4
>  Vsite spread           4      10002        4.222        1.8     0.2
>  Write traj.            4          1        0.348        0.1     0.0
>  Update                 4       5001       19.821        8.5     0.8
>  Constraints            4       5001       39.736       17.1     1.7
>  Comm. energies         4       5001        3.181        1.4     0.1
>  Rest                   4                  18.084        7.8     0.8
> -----------------------------------------------------------------------
>  Total                  8                2365.556     1016.0   100.0
> -----------------------------------------------------------------------
> 
>     Parallel run - timing based on wallclock.
> 
>                NODE (s)   Real (s)      (%)
>        Time:    127.000    127.000    100.0
>                        2:07
>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    244.853     13.855     13.609      1.764
> Finished mdrun on node 0 Mon Feb 15 22:24:07 2010
> 
> ====================================================
> 16 node cut-off = 1.1 nm, fourier_spacing=0.135 
> 
>  Average load imbalance: 7.0 %
>  Part of the total run time spent waiting due to load imbalance: 3.5 %
>  Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 %
>  Average PME mesh/force load: 0.872
>  Part of the total run time spent waiting due to PP/PME imbalance: 4.2 %
> 
> 
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
>  Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
>  Domain decomp.         8       1001       55.569       23.8     1.9
>  Vsite constr.          8       5001        3.334        1.4     0.1
>  Send X to PME          8       5001       24.192       10.4     0.8
>  Comm. coord.           8       5001       49.191       21.1     1.7
>  Neighbor search        8       1001      300.578      128.8    10.3
>  Force                  8       5001      734.497      314.9    25.2
>  Wait + Comm. F         8       5001      166.258       71.3     5.7
>  PME mesh               8       5001      809.589      347.1    27.8
>  Wait + Comm. X/F       8       5001      640.310      274.5    22.0
>  Wait + Recv. PME F     8       5001       12.332        5.3     0.4
>  Vsite spread           8      10002       11.558        5.0     0.4
>  Write traj.            8          1        0.685        0.3     0.0
>  Update                 8       5001       18.789        8.1     0.6
>  Constraints            8       5001       47.320       20.3     1.6
>  Comm. energies         8       5001       12.562        5.4     0.4
>  Rest                   8                  24.538       10.5     0.8
> -----------------------------------------------------------------------
>  Total                 16                2911.302     1248.0   100.0
> -----------------------------------------------------------------------
> 
>     Parallel run - timing based on wallclock.
> 
>                NODE (s)   Real (s)      (%)
>        Time:     78.000     78.000    100.0
>                        1:18
>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    398.725     22.539     22.158      1.083
> Finished mdrun on node 0 Mon Feb 15 22:54:31 2010
> 
> 
> 
> 
> On Mon, Feb 15, 2010 at 5:36 PM, Carsten Kutzner <ckutzne at gwdg.de> wrote:
> Hi,
> 
> 18 seconds real time is a bit short for such a test. You should run
> at least several minutes. The performance you can expect depends
> a lot on the interconnect you are using. You will definitely need a
> really low-latency interconnect if you have less then 1000 atoms
> per core.
> 
> Carsten
> 
> 
> On Feb 15, 2010, at 3:13 PM, Deniz KARASU wrote:
> 
> > Hi All,
> >
> > I'm trying to d.lzm gromacs benchmarks with 64 node machine, but   dynamic load balancing performance is very low.
> >
> > Any suggestion will be of great help.
> >
> > Thanks.
> >
> > Deniz KARASU
> >
> 
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> 
> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php


--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/home/grubmueller/ihp/ckutzne




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20100216/bad030dc/attachment.html>


More information about the gromacs.org_gmx-users mailing list