[gmx-users] Loosing partly the available CPU time
Alexander Alexander
alexanderwien2k at gmail.com
Sun Aug 14 15:24:29 CEST 2016
Dear gromacs user,
My free energy calculation works well, however, I am loosing around 56.5 %
of the available CPU time as stated in my log file which is really
considerable. The problem is due to the load imbalance and domain
decomposition, but I have no idea to improve it, below is the very end of
my log file and I would be so appreciated if you could help avoid this.
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 115357.4
av. #atoms communicated per step for LINCS: 2 x 2389.1
Average load imbalance: 285.9 %
Part of the total run time spent waiting due to load imbalance: 56.5 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 2
% Y 2 % Z 2 %
Average PME mesh/force load: 0.384
Part of the total run time spent waiting due to PP/PME imbalance: 14.5 %
NOTE: 56.5 % of the available CPU time was lost due to load imbalance
in the domain decomposition.
NOTE: 14.5 % performance was lost because the PME ranks
had less work to do than the PP ranks.
You might want to decrease the number of PME ranks
or decrease the cut-off and the grid spacing.
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 96 MPI ranks doing PP, and
on 32 MPI ranks doing PME
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 96 1 175000 242.339 53508.472 0.5
DD comm. load 96 1 174903 9.076 2003.907 0.0
DD comm. bounds 96 1 174901 27.054 5973.491 0.1
Send X to PME 96 1 7000001 44.342 9790.652 0.1
Neighbor search 96 1 175001 251.994 55640.264 0.6
Comm. coord. 96 1 6825000 1521.009 335838.747 3.4
Force 96 1 7000001 7001.990 1546039.264 15.5
Wait + Comm. F 96 1 7000001 10761.296 2376093.759 23.8
PME mesh * 32 1 7000001 11796.344 868210.788 8.7
PME wait for PP * 22135.752 1629191.096 16.3
Wait + Recv. PME F 96 1 7000001 393.117 86800.265 0.9
NB X/F buffer ops. 96 1 20650001 132.713 29302.991 0.3
COM pull force 96 1 7000001 165.613 36567.368 0.4
Write traj. 96 1 7037 55.020 12148.457 0.1
Update 96 1 14000002 140.972 31126.607 0.3
Constraints 96 1 14000002 12871.236 2841968.551 28.4
Comm. energies 96 1 350001 261.976 57844.219 0.6
Rest 52.349 11558.715 0.1
-----------------------------------------------------------------------------
Total 33932.096 9989607.639 100.0
-----------------------------------------------------------------------------
(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 32 1 21000003 2334.608 171827.143 1.7
PME spread/gather 32 1 28000004 3640.870 267967.972 2.7
PME 3D-FFT 32 1 28000004 1587.105 116810.882 1.2
PME 3D-FFT Comm. 32 1 56000008 4066.097 299264.666 3.0
PME solve Elec 32 1 14000002 148.284 10913.728 0.1
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 4341204.790 33932.096 12793.8
9h25:32
(ns/day) (hour/ns)
Performance: 35.648 0.673
Finished mdrun on rank 0 Sat Aug 13 23:45:45 2016
Thanks,
Regards,
Alex
More information about the gromacs.org_gmx-users
mailing list