[gmx-users] Simulation getting slower and ultimately crashing

soumadwip ghosh soumadwipghosh at gmail.com
Wed Apr 27 14:01:45 CEST 2016


Hi,
   I think I forgot to provide the md.log file in the previous mail. So,
here it is:

"md_tric.log" 1513L, 76430C


                        1,1           Top

 av. #atoms communicated per step for force:  2 x 552.4
 av. #atoms communicated per step for LINCS:  2 x 0.0

 Average load imbalance: 2044.6 %
 Part of the total run time spent waiting due to load imbalance: 5.0 %
 Average PME mesh/force load: 0.523
 Part of the total run time spent waiting due to PP/PME imbalance: 6.2 %

NOTE: 5.0 % performance was lost due to load imbalance
      in the domain decomposition.
      You might want to use dynamic load balancing (option -dlb.)

NOTE: 6.2 % performance was lost because the PME nodes
      had less work to do than the PP nodes.
      You might want to decrease the number of PME nodes
      or decrease the cut-off and the grid spacing.


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.        60         54      820.937      410.9    22.8
 DD comm. load         60          5        0.390        0.2     0.0
 Send X to PME         60        271       12.558        6.3     0.3
 Comm. coord.          60        271        7.312        3.7     0.2
 Neighbor search       60         55     1313.368      657.4    36.5
 Force                 60        271        9.671        4.8     0.3
 Wait + Comm. F        60        271      223.680      112.0     6.2
 PME mesh              20        271       90.758       45.4     2.5
 Wait + Comm. X/F      20                 809.835      405.4    22.5
 Wait + Recv. PME F    60        271      114.992       57.6     3.2
 Write traj.           60          2        6.676        3.3     0.2
 Update                60        271        1.902        1.0     0.1
 Constraints           60        271        2.164        1.1     0.1
 Comm. energies        60         55      185.755       93.0     5.2
 Rest                  60                   2.625        1.3     0.1
-----------------------------------------------------------------------
 Total                 80                3602.623     1803.3   100.0
-----------------------------------------------------------------------
-----------------------------------------------------------------------
 PME redist. X/F       20        542       59.722       29.9     1.7
 PME spread/gather     20        542       21.280       10.7     0.6
 PME 3D-FFT            20        542        9.134        4.6     0.3
 PME solve             20        271        0.504        0.3     0.0
-----------------------------------------------------------------------

NOTE: 5 % of the run time was spent communicating energies,
      you might want to use the -gcom option of mdrun


        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     22.542     22.542    100.0
               (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:      3.685    578.251      2.077     11.553
Finished mdrun on node 0 Wed Apr 27 05:16:18 2016

I am guessing there are some issues with the PME calculations and the
number of nodes used for a smaller system like mine. In that case what
would be the correct combination for options such as -npme or -nt? Should I
use the -dlb option? I would love to hear fro the experts.

Soumadwip Ghosh
Senior Research Fellow
IITB
India


More information about the gromacs.org_gmx-users mailing list