[gmx-users] Too much PME mesh wall time.

Yunlong Liu yliu120 at jh.edu
Sun Aug 24 02:19:47 CEST 2014


Hi gromacs users,

I met a problem with too much PME Mesh time in my simulation. The 
following is my time accounting. I am running my simulation on 2 nodes. 
Each of them has 16 CPUs and 1 Tesla K20m Nvidia GPU.

And my mdrun command is ibrun 
/work/03002/yliu120/gromacs-5/bin/mdrun_mpi -pin on -ntomp 8 -dlb no 
-deffnm pi3k-wt-charm-4 -gpu_id 00.

I manually turned off dlb since when it is turned on, the simulation 
will crash. I have reported it to both mailing lists and talked to Roland.

      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 8 OpenMP threads

  Computing:          Num   Num      Call    Wall time Giga-Cycles
                      Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
  Domain decomp.         4    8     150000    1592.099 137554.334   2.2
  DD comm. load          4    8        751       0.057 4.947   0.0
  Neighbor search        4    8     150001     665.072 57460.919   0.9
  Launch GPU ops.        4    8   15000002     967.023 83548.916   1.3
  Comm. coord.           4    8    7350000    2488.263 214981.185   3.5
  Force                  4    8    7500001    7037.401 608018.042   9.8
  Wait + Comm. F         4    8    7500001    3931.222 339650.132   5.5
* PME mesh               4    8 7500001   40799.937    3525036.971  56.7*
  Wait GPU nonlocal      4    8    7500001    1985.151 171513.300   2.8
  Wait GPU local         4    8    7500001      68.365 5906.612   0.1
  NB X/F buffer ops.     4    8   29700002    1229.406 106218.328   1.7
  Write traj.            4    8        830      28.245 2440.304   0.0
  Update                 4    8    7500001    2479.611 214233.669   3.4
  Constraints            4    8    7500001    7041.030 608331.635   9.8
  Comm. energies         4    8     150001      14.250 1231.154   0.0
  Rest                                        1601.588 138374.139   2.2
-----------------------------------------------------------------------------
  Total                                      71928.719 6214504.588 100.0
-----------------------------------------------------------------------------
  Breakdown of PME mesh computation
-----------------------------------------------------------------------------
  PME redist. X/F        4    8   15000002    8362.454 722500.151  11.6
  PME spread/gather      4    8   15000002   14836.350 1281832.463  20.6
  PME 3D-FFT             4    8   15000002    8985.776 776353.949  12.5
  PME 3D-FFT Comm.       4    8   15000002    7547.935 652127.220  10.5
  PME solve Elec         4    8    7500001    1025.249 88579.550   1.4
-----------------------------------------------------------------------------

First, I would like to know whether this is a big problem and second, I 
want to know how to improve my performance?
Does it mean that my GPU is running too fast and CPU is waiting. BTW, 
what does the wait GPU nonlocal refer to?

Thank you.
Yunlong

-- 

========================================
Yunlong Liu, PhD Candidate
Computational Biology and Biophysics
Department of Biophysics and Biophysical Chemistry
School of Medicine, The Johns Hopkins University
Email: yliu120 at jhmi.edu
Address: 725 N Wolfe St, WBSB RM 601, 21205
========================================



More information about the gromacs.org_gmx-users mailing list