[gmx-users] Time accounting and performance

Alex alexanderwien2k at gmail.com
Thu Jul 26 20:10:53 CEST 2018

Dear all,

I use 128 ranks in 4 nodes to run GROMACS on a pretty large system
containing around 850000 atoms.

#PBS -l select=4:ncpus=32:mpiprocs=32
 -n 128 gmx_mpi mdrun -ntomp 1 -deffnm eql1 -s eql1.tpr -rdd 1.5 -dds
0.9999 -npme 8 -ntomp_pme 1 -g eql1.log

Below is the statistic of the simulation's performance given in the end of
.log file of a 1 ns NVT simulation.
Based on the below info, would you please let me know how I can improve the
I can even increase the number of nodes further)
More than 88% of the simulation time goes for computing the Force, I wonder
if this is normal?

    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 1445686.6
 av. #atoms communicated per step for LINCS:  2 x 51538.9

 Dynamic load balancing report:
 DLB was turned on during the run due to measured imbalance.
 Average load imbalance: 1.1%.
 The balanceable part of the MD step is 95%, load imbalance is computed
from this.
 Part of the total run time spent waiting due to load imbalance: 1.0%.
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0
% Y 0 % Z 0 %
 Average PME mesh/force load: 0.993
 Part of the total run time spent waiting due to PP/PME imbalance: 0.0 %

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 120 MPI ranks doing PP, and
on 8 MPI ranks doing PME

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
 Domain decomp.       120    1      10000     176.939      48835.235   0.2
 DD comm. load        120    1       9983       0.275         75.859   0.0
 DD comm. bounds      120    1       9901       2.485        685.808   0.0
 Send X to PME        120    1    1000001     245.752      67827.508   0.3
 Neighbor search      120    1      10001     633.762     174918.551   0.8
 Comm. coord.         120    1     990000     553.168     152674.581   0.7
 Force                120    1    1000001   67250.584   18561182.443  88.4
 Wait + Comm. F       120    1    1000001    1086.662     299919.155   1.4
 PME mesh *             8    1    1000001   68149.287    1253948.313   6.0
 PME wait for PP *                           3157.562      58099.206   0.3
 Wait + Recv. PME F   120    1    1000001     215.074      59360.407   0.3
 NB X/F buffer ops.   120    1    2980001     354.502      97842.785   0.5
 Write traj.          120    1        180       4.645       1281.970   0.0
 Update               120    1    1000001     149.694      41315.622   0.2
 Constraints          120    1    1000001     475.301     131183.120   0.6
 Comm. energies       120    1     100001     104.565      28860.099   0.1
 Rest                                          53.445      14750.800   0.1
 Total                                      71306.853   20992761.540 100.0
(*) Note that with separate PME ranks, the walltime column actually sums to
    twice the total reported, but the cycle count total and % are correct.
(*) Note that with separate PME ranks, the walltime column actually sums to
    twice the total reported, but the cycle count total and % are correct.
 Breakdown of PME mesh computation
 PME redist. X/F        8    1    2000002    3671.428      67554.352   0.3
 PME spread             8    1    1000001   19926.223     366642.922   1.7
 PME gather             8    1    1000001   14964.352     275344.401   1.3
 PME 3D-FFT             8    1    2000002   22473.744     413517.373   2.0
 PME 3D-FFT Comm.       8    1    2000002    4542.257      83577.624   0.4
 PME solve Elec         8    1    1000001    2558.913      47084.047   0.2

               Core t (s)   Wall t (s)        (%)
       Time:  9127277.097    71306.853    12800.0
                 (ns/day)    (hour/ns)
Performance:        1.212       19.807

Thank you.

More information about the gromacs.org_gmx-users mailing list