[gmx-users] Time accounting and performance
Alex
alexanderwien2k at gmail.com
Thu Jul 26 20:10:53 CEST 2018
Dear all,
I use 128 ranks in 4 nodes to run GROMACS on a pretty large system
containing around 850000 atoms.
#PBS -l select=4:ncpus=32:mpiprocs=32
-n 128 gmx_mpi mdrun -ntomp 1 -deffnm eql1 -s eql1.tpr -rdd 1.5 -dds
0.9999 -npme 8 -ntomp_pme 1 -g eql1.log
Below is the statistic of the simulation's performance given in the end of
.log file of a 1 ns NVT simulation.
Based on the below info, would you please let me know how I can improve the
performance?(
I can even increase the number of nodes further)
More than 88% of the simulation time goes for computing the Force, I wonder
if this is normal?
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 1445686.6
av. #atoms communicated per step for LINCS: 2 x 51538.9
Dynamic load balancing report:
DLB was turned on during the run due to measured imbalance.
Average load imbalance: 1.1%.
The balanceable part of the MD step is 95%, load imbalance is computed
from this.
Part of the total run time spent waiting due to load imbalance: 1.0%.
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0
% Y 0 % Z 0 %
Average PME mesh/force load: 0.993
Part of the total run time spent waiting due to PP/PME imbalance: 0.0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 120 MPI ranks doing PP, and
on 8 MPI ranks doing PME
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 120 1 10000 176.939 48835.235 0.2
DD comm. load 120 1 9983 0.275 75.859 0.0
DD comm. bounds 120 1 9901 2.485 685.808 0.0
Send X to PME 120 1 1000001 245.752 67827.508 0.3
Neighbor search 120 1 10001 633.762 174918.551 0.8
Comm. coord. 120 1 990000 553.168 152674.581 0.7
Force 120 1 1000001 67250.584 18561182.443 88.4
Wait + Comm. F 120 1 1000001 1086.662 299919.155 1.4
PME mesh * 8 1 1000001 68149.287 1253948.313 6.0
PME wait for PP * 3157.562 58099.206 0.3
Wait + Recv. PME F 120 1 1000001 215.074 59360.407 0.3
NB X/F buffer ops. 120 1 2980001 354.502 97842.785 0.5
Write traj. 120 1 180 4.645 1281.970 0.0
Update 120 1 1000001 149.694 41315.622 0.2
Constraints 120 1 1000001 475.301 131183.120 0.6
Comm. energies 120 1 100001 104.565 28860.099 0.1
Rest 53.445 14750.800 0.1
-----------------------------------------------------------------------------
Total 71306.853 20992761.540 100.0
-----------------------------------------------------------------------------
(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.
-----------------------------------------------------------------------------
(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 8 1 2000002 3671.428 67554.352 0.3
PME spread 8 1 1000001 19926.223 366642.922 1.7
PME gather 8 1 1000001 14964.352 275344.401 1.3
PME 3D-FFT 8 1 2000002 22473.744 413517.373 2.0
PME 3D-FFT Comm. 8 1 2000002 4542.257 83577.624 0.4
PME solve Elec 8 1 1000001 2558.913 47084.047 0.2
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 9127277.097 71306.853 12800.0
19h48:26
(ns/day) (hour/ns)
Performance: 1.212 19.807
Thank you.
Regards,
Alex
More information about the gromacs.org_gmx-users
mailing list