[gmx-users] Re: loab imbalance
lina
zhao0139 at ntu.edu.sg
Wed Apr 7 12:11:43 CEST 2010
> The first time I did not notice that 16 cpus are twice as slow as 8.
> Are you really sure you did not mix things up?
> The other way around the timings would make perfect sense.
> If not, there is a problem with your 16 cpu simulation.
>
> What load imbalance is reported for the 8 cpu run?
>
> Berk
>
Sorry, recently so sleepy, so I mixed up with the 8cpus and 16 cpus in
the last email. no doubt that the 16 cpus fast. Seems 8 cpus no
problems.
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 70598.6
av. #atoms communicated per step for LINCS: 2 x 952.6
Average load imbalance: 1500.0 %
Part of the total run time spent waiting due to load imbalance: 187.5 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
X 0 % Y
0 %
NOTE: 187.5 % performance was lost due to load imbalance
in the domain decomposition.
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 16 1000000 121878.350 36533.7 4.4
Comm. coord. 16 5000001 247646.114 74233.2 8.9
Neighbor search 16 1000001 1002091.741 300382.2 35.9
Force 16 5000001 919401.942 275595.5 32.9
Wait + Comm. F 16 5000001 299888.017 89893.0 10.7
Write traj. 16 2001 249.523 74.8 0.0
Update 16 5000001 20258.672 6072.6 0.7
Constraints 16 5000001 105721.492 31690.6 3.8
Comm. energies 16 5000001 51653.353 15483.4 1.9
Rest 16 22395.593 6713.2 0.8
-----------------------------------------------------------------------
Total 16 2791184.797 836672.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 52292.000 52292.000 100.0
14h31:32
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 523.244 19.720 16.523 1.453
Finished mdrun on node 0 Tue Apr 6 05:09:47 2010
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\(below is
8cpu's md.log)
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 46783.6
av. #atoms communicated per step for LINCS: 2 x 664.8
Average load imbalance: 9.8 %
Part of the total run time spent waiting due to load imbalance: 3.5 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 8 1000000 69839.474 20934.6 2.7
Comm. coord. 8 5000001 162704.625 48771.3 6.3
Neighbor search 8 1000001 982919.098 294633.3 38.2
Force 8 5000001 917463.617 275012.8 35.6
Wait + Comm. F 8 5000001 273548.449 81997.1 10.6
Update 8 5000001 20465.372 6134.6 0.8
Constraints 8 5000001 96414.222 28900.5 3.7
Comm. energies 8 5000001 28884.158 8658.1 1.1
Rest 8 9223372036.855 2764736802.2
358286.2
-----------------------------------------------------------------------
Total 8 2574303.046 771656.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 96457.000 96457.000 100.0
1d02h47:37
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 283.696 10.701 8.957 2.679
Finished mdrun on node 0 Mon Apr 5 01:36:18 2010
Thanks and regards,
lina
More information about the gromacs.org_gmx-users
mailing list