[gmx-users] Time accounting and performance
Mark Abraham
mark.j.abraham at gmail.com
Thu Jul 26 22:57:03 CEST 2018
Hi,
The various kinds of balance look great, and it is entirely normal to spend
all the wall time either computing forces or waiting for them to be used
elsewhere. You're more than a factor of ten from the scaling limit
though... Without GPUs, planning to use 500 or less atoms per x86 core has
been pretty clear for many years, though the actual best achievable depends
on the simulation and the network.
Mark
On Thu, Jul 26, 2018, 20:11 Alex <alexanderwien2k at gmail.com> wrote:
> Dear all,
>
> I use 128 ranks in 4 nodes to run GROMACS on a pretty large system
> containing around 850000 atoms.
>
> #PBS -l select=4:ncpus=32:mpiprocs=32
> -n 128 gmx_mpi mdrun -ntomp 1 -deffnm eql1 -s eql1.tpr -rdd 1.5 -dds
> 0.9999 -npme 8 -ntomp_pme 1 -g eql1.log
>
> Below is the statistic of the simulation's performance given in the end of
> .log file of a 1 ns NVT simulation.
> Based on the below info, would you please let me know how I can improve the
> performance?(
> I can even increase the number of nodes further)
> More than 88% of the simulation time goes for computing the Force, I wonder
> if this is normal?
>
>
> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>
> av. #atoms communicated per step for force: 2 x 1445686.6
> av. #atoms communicated per step for LINCS: 2 x 51538.9
>
>
> Dynamic load balancing report:
> DLB was turned on during the run due to measured imbalance.
> Average load imbalance: 1.1%.
> The balanceable part of the MD step is 95%, load imbalance is computed
> from this.
> Part of the total run time spent waiting due to load imbalance: 1.0%.
> Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0
> % Y 0 % Z 0 %
> Average PME mesh/force load: 0.993
> Part of the total run time spent waiting due to PP/PME imbalance: 0.0 %
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 120 MPI ranks doing PP, and
> on 8 MPI ranks doing PME
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
>
> -----------------------------------------------------------------------------
> Domain decomp. 120 1 10000 176.939 48835.235 0.2
> DD comm. load 120 1 9983 0.275 75.859 0.0
> DD comm. bounds 120 1 9901 2.485 685.808 0.0
> Send X to PME 120 1 1000001 245.752 67827.508 0.3
> Neighbor search 120 1 10001 633.762 174918.551 0.8
> Comm. coord. 120 1 990000 553.168 152674.581 0.7
> Force 120 1 1000001 67250.584 18561182.443 88.4
> Wait + Comm. F 120 1 1000001 1086.662 299919.155 1.4
> PME mesh * 8 1 1000001 68149.287 1253948.313 6.0
> PME wait for PP * 3157.562 58099.206 0.3
> Wait + Recv. PME F 120 1 1000001 215.074 59360.407 0.3
> NB X/F buffer ops. 120 1 2980001 354.502 97842.785 0.5
> Write traj. 120 1 180 4.645 1281.970 0.0
> Update 120 1 1000001 149.694 41315.622 0.2
> Constraints 120 1 1000001 475.301 131183.120 0.6
> Comm. energies 120 1 100001 104.565 28860.099 0.1
> Rest 53.445 14750.800 0.1
>
> -----------------------------------------------------------------------------
> Total 71306.853 20992761.540 100.0
>
> -----------------------------------------------------------------------------
> (*) Note that with separate PME ranks, the walltime column actually sums to
> twice the total reported, but the cycle count total and % are correct.
>
> -----------------------------------------------------------------------------
> (*) Note that with separate PME ranks, the walltime column actually sums to
> twice the total reported, but the cycle count total and % are correct.
>
> -----------------------------------------------------------------------------
> Breakdown of PME mesh computation
>
> -----------------------------------------------------------------------------
> PME redist. X/F 8 1 2000002 3671.428 67554.352 0.3
> PME spread 8 1 1000001 19926.223 366642.922 1.7
> PME gather 8 1 1000001 14964.352 275344.401 1.3
> PME 3D-FFT 8 1 2000002 22473.744 413517.373 2.0
> PME 3D-FFT Comm. 8 1 2000002 4542.257 83577.624 0.4
> PME solve Elec 8 1 1000001 2558.913 47084.047 0.2
>
> -----------------------------------------------------------------------------
>
> Core t (s) Wall t (s) (%)
> Time: 9127277.097 71306.853 12800.0
> 19h48:26
> (ns/day) (hour/ns)
> Performance: 1.212 19.807
>
> Thank you.
> Regards,
> Alex
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list