[gmx-users] Too much PME mesh wall time.
Mark Abraham
mark.j.abraham at gmail.com
Mon Aug 25 13:38:25 CEST 2014
On Sun, Aug 24, 2014 at 2:19 AM, Yunlong Liu <yliu120 at jh.edu> wrote:
> Hi gromacs users,
>
> I met a problem with too much PME Mesh time in my simulation. The
> following is my time accounting. I am running my simulation on 2 nodes.
> Each of them has 16 CPUs and 1 Tesla K20m Nvidia GPU.
>
> And my mdrun command is ibrun /work/03002/yliu120/gromacs-5/bin/mdrun_mpi
> -pin on -ntomp 8 -dlb no -deffnm pi3k-wt-charm-4 -gpu_id 00.
>
> I manually turned off dlb since when it is turned on, the simulation will
> crash. I have reported it to both mailing lists and talked to Roland.
>
Hmm. This shouldn't happen. Can you please open an issue at
http://redmine.gromacs.org/ and upload enough info for us to replicate it?
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 4 MPI ranks, each using 8 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
> ------------------------------------------------------------
> -----------------
> Domain decomp. 4 8 150000 1592.099 137554.334 2.2
> DD comm. load 4 8 751 0.057 4.947 0.0
> Neighbor search 4 8 150001 665.072 57460.919 0.9
> Launch GPU ops. 4 8 15000002 967.023 83548.916 1.3
> Comm. coord. 4 8 7350000 2488.263 214981.185 3.5
> Force 4 8 7500001 7037.401 608018.042 9.8
> Wait + Comm. F 4 8 7500001 3931.222 339650.132 5.5
> * PME mesh 4 8 7500001 40799.937 3525036.971 56.7*
> Wait GPU nonlocal 4 8 7500001 1985.151 171513.300 2.8
> Wait GPU local 4 8 7500001 68.365 5906.612 0.1
> NB X/F buffer ops. 4 8 29700002 1229.406 106218.328 1.7
> Write traj. 4 8 830 28.245 2440.304 0.0
> Update 4 8 7500001 2479.611 214233.669 3.4
> Constraints 4 8 7500001 7041.030 608331.635 9.8
> Comm. energies 4 8 150001 14.250 1231.154 0.0
> Rest 1601.588 138374.139 2.2
> ------------------------------------------------------------
> -----------------
> Total 71928.719 6214504.588 100.0
> ------------------------------------------------------------
> -----------------
> Breakdown of PME mesh computation
> ------------------------------------------------------------
> -----------------
> PME redist. X/F 4 8 15000002 8362.454 722500.151 11.6
> PME spread/gather 4 8 15000002 14836.350 1281832.463 20.6
> PME 3D-FFT 4 8 15000002 8985.776 776353.949 12.5
> PME 3D-FFT Comm. 4 8 15000002 7547.935 652127.220 10.5
> PME solve Elec 4 8 7500001 1025.249 88579.550 1.4
> ------------------------------------------------------------
> -----------------
>
> First, I would like to know whether this is a big problem and second, I
> want to know how to improve my performance?
>
"Too much" mesh time is not really possible. With the GPU doing the
short-ranged work, the only work for the CPU to do is the bondeds (in
"Force" above) and long-range (PME mesh). Those ought to dominate the run
time, and roughly in that ratio for a typical biomolecular system.
Does it mean that my GPU is running too fast and CPU is waiting.
Looks balanced - if the GPU had too much work then the Wait GPU times would
be appreciable. What did the PP-PME load balancing at the start of the run
look like?
> BTW, what does the wait GPU nonlocal refer to?
>
When using DD and GPUs, the short-ranged work on each PP rank is decomposed
into a set whose resulting forces are needed by other ("non-local") PP
ranks, and the rest. Then the non-local work is done first, so that once
PME mesh work is done, the PP<->PP MPI communication could be overlapped
with the local short-ranged GPU work. The 0.1% time for "Wait GPU local"
indicates that the communication took longer than the amount of local work,
perhaps because there was not much of the latter or it was already
complete. Unfortunately, it is not always possible to get timing
information from CUDA without slowing down the run. What actually happens
is strongly dependent on the hardware and simulation system.
Mark
Thank you.
> Yunlong
>
> --
>
> ========================================
> Yunlong Liu, PhD Candidate
> Computational Biology and Biophysics
> Department of Biophysics and Biophysical Chemistry
> School of Medicine, The Johns Hopkins University
> Email: yliu120 at jhmi.edu
> Address: 725 N Wolfe St, WBSB RM 601, 21205
> ========================================
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list