[gmx-users] Time accounting and performance

Thu Jul 26 22:57:03 CEST 2018

Hi,

The various kinds of balance look great, and it is entirely normal to spend
all the wall time either computing forces or waiting for them to be used
elsewhere. You're more than a factor of ten from the scaling limit
though... Without GPUs, planning to use 500 or less atoms per x86 core has
been pretty clear for many years, though the actual best achievable depends
on the simulation and the network.

Mark

On Thu, Jul 26, 2018, 20:11 Alex <alexanderwien2k at gmail.com> wrote:

> Dear all,
>
> I use 128 ranks in 4 nodes to run GROMACS on a pretty large system
> containing around 850000 atoms.
>
> #PBS -l select=4:ncpus=32:mpiprocs=32
>  -n 128 gmx_mpi mdrun -ntomp 1 -deffnm eql1 -s eql1.tpr -rdd 1.5 -dds
> 0.9999 -npme 8 -ntomp_pme 1 -g eql1.log
>
> Below is the statistic of the simulation's performance given in the end of
> .log file of a 1 ns NVT simulation.
> Based on the below info, would you please let me know how I can improve the
> performance?(
> I can even increase the number of nodes further)
> More than 88% of the simulation time goes for computing the Force, I wonder
> if this is normal?
>
>
>     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>
>  av. #atoms communicated per step for force:  2 x 1445686.6
>  av. #atoms communicated per step for LINCS:  2 x 51538.9
>
>
>  Dynamic load balancing report:
>  DLB was turned on during the run due to measured imbalance.
>  Average load imbalance: 1.1%.
>  The balanceable part of the MD step is 95%, load imbalance is computed
> from this.
>  Part of the total run time spent waiting due to load imbalance: 1.0%.
>  Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0
> % Y 0 % Z 0 %
>  Average PME mesh/force load: 0.993
>  Part of the total run time spent waiting due to PP/PME imbalance: 0.0 %
>
>
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
> On 120 MPI ranks doing PP, and
> on 8 MPI ranks doing PME
>
>  Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                      Ranks Threads  Count      (s)         total sum    %
>
> -----------------------------------------------------------------------------
>  Domain decomp.       120    1      10000     176.939      48835.235   0.2
>  DD comm. load        120    1       9983       0.275         75.859   0.0
>  DD comm. bounds      120    1       9901       2.485        685.808   0.0
>  Send X to PME        120    1    1000001     245.752      67827.508   0.3
>  Neighbor search      120    1      10001     633.762     174918.551   0.8
>  Comm. coord.         120    1     990000     553.168     152674.581   0.7
>  Force                120    1    1000001   67250.584   18561182.443  88.4
>  Wait + Comm. F       120    1    1000001    1086.662     299919.155   1.4
>  PME mesh *             8    1    1000001   68149.287    1253948.313   6.0
>  PME wait for PP *                           3157.562      58099.206   0.3
>  Wait + Recv. PME F   120    1    1000001     215.074      59360.407   0.3
>  NB X/F buffer ops.   120    1    2980001     354.502      97842.785   0.5
>  Write traj.          120    1        180       4.645       1281.970   0.0
>  Update               120    1    1000001     149.694      41315.622   0.2
>  Constraints          120    1    1000001     475.301     131183.120   0.6
>  Comm. energies       120    1     100001     104.565      28860.099   0.1
>  Rest                                          53.445      14750.800   0.1
>
> -----------------------------------------------------------------------------
>  Total                                      71306.853   20992761.540 100.0
>
> -----------------------------------------------------------------------------
> (*) Note that with separate PME ranks, the walltime column actually sums to
>     twice the total reported, but the cycle count total and % are correct.
>
> -----------------------------------------------------------------------------
> (*) Note that with separate PME ranks, the walltime column actually sums to
>     twice the total reported, but the cycle count total and % are correct.
>
> -----------------------------------------------------------------------------
>  Breakdown of PME mesh computation
>
> -----------------------------------------------------------------------------
>  PME redist. X/F        8    1    2000002    3671.428      67554.352   0.3
>  PME spread             8    1    1000001   19926.223     366642.922   1.7
>  PME gather             8    1    1000001   14964.352     275344.401   1.3
>  PME 3D-FFT             8    1    2000002   22473.744     413517.373   2.0
>  PME 3D-FFT Comm.       8    1    2000002    4542.257      83577.624   0.4
>  PME solve Elec         8    1    1000001    2558.913      47084.047   0.2
>
> -----------------------------------------------------------------------------
>
>                Core t (s)   Wall t (s)        (%)
>        Time:  9127277.097    71306.853    12800.0
>                          19h48:26
>                  (ns/day)    (hour/ns)
> Performance:        1.212       19.807
>
> Thank you.
> Regards,
> Alex
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>