[gmx-users] Fwd: Simulation getting slower and ultimately crashing

Thu Apr 28 10:39:30 CEST 2016

---------- Forwarded message ----------
From: soumadwip ghosh <soumadwipghosh at gmail.com>
Date: Wed, Apr 27, 2016 at 2:52 PM
Subject: Simulation getting slower and ultimately crashing
To: "gromacs.org_gmx-users" <gromacs.org_gmx-users at maillist.sys.kth.se>

Hi,
     I am simulating a nucleic acid in the presence of a 15X15 single
walled carbon nanotube using CHARMM 27 force field and TIP3P water model.
In my previous published work, I did the simulation inside a cubic box and
everything ran fine but as of now whenever I am switching over to a
triclinic box the simulation goes smoothly up to NPT (7 ns) step but it
crashes down during production run printing the following error:

"inconsistent dd boundary staggering limits gromacs"

It is telling about the improper equilibration of the system and its
blowing up. What might be the reason for such instability of the system?
Also the simulation gets slower and slower before crashing down. I am using
GROMACS 4.5.6 on CenTOS.

I am uploading my files below for reference. One different thing (apart
from changing the box type) I have done is I have rotated the hybrid system
(nucleic acid+CNT) in y-axis using editconf and then defined the box type.
Apart from that every steps performed (and the parameter files) are
identical. I created the SWCNT topology using g_x2top with CHARMM 27
parameters. Any kind help will be appreciated.

cnt.itp file: https://drive.google.com/open?id=0B7SBnQ5YXQSLZ0k3QnJPaXBmbnc

the hybrid.pdb file (containing the NA+CNT):
https://drive.google.com/open?id=0B7SBnQ5YXQSLTEpibVByTEg4N0E

the md.mdp file:
https://drive.google.com/open?id=0B7SBnQ5YXQSLMmd1di1jbHpubFE

Here is  the md.log file

"md_tric.log" 1513L, 76430C

                        1,1           Top

 av. #atoms communicated per step for force:  2 x 552.4
 av. #atoms communicated per step for LINCS:  2 x 0.0

 Average load imbalance: 2044.6 %
 Part of the total run time spent waiting due to load imbalance: 5.0 %
 Average PME mesh/force load: 0.523
 Part of the total run time spent waiting due to PP/PME imbalance: 6.2 %

NOTE: 5.0 % performance was lost due to load imbalance
      in the domain decomposition.
      You might want to use dynamic load balancing (option -dlb.)

NOTE: 6.2 % performance was lost because the PME nodes
      had less work to do than the PP nodes.
      You might want to decrease the number of PME nodes
      or decrease the cut-off and the grid spacing.

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.        60         54      820.937      410.9    22.8
 DD comm. load         60          5        0.390        0.2     0.0
 Send X to PME         60        271       12.558        6.3     0.3
 Comm. coord.          60        271        7.312        3.7     0.2
 Neighbor search       60         55     1313.368      657.4    36.5
 Force                 60        271        9.671        4.8     0.3
 Wait + Comm. F        60        271      223.680      112.0     6.2
 PME mesh              20        271       90.758       45.4     2.5
 Wait + Comm. X/F      20                 809.835      405.4    22.5
 Wait + Recv. PME F    60        271      114.992       57.6     3.2
 Write traj.           60          2        6.676        3.3     0.2
 Update                60        271        1.902        1.0     0.1
 Constraints           60        271        2.164        1.1     0.1
 Comm. energies        60         55      185.755       93.0     5.2
 Rest                  60                   2.625        1.3     0.1
-----------------------------------------------------------------------
 Total                 80                3602.623     1803.3   100.0
-----------------------------------------------------------------------
-----------------------------------------------------------------------
 PME redist. X/F       20        542       59.722       29.9     1.7
 PME spread/gather     20        542       21.280       10.7     0.6
 PME 3D-FFT            20        542        9.134        4.6     0.3
 PME solve             20        271        0.504        0.3     0.0
-----------------------------------------------------------------------

NOTE: 5 % of the run time was spent communicating energies,
      you might want to use the -gcom option of mdrun

        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     22.542     22.542    100.0
               (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:      3.685    578.251      2.077     11.553
Finished mdrun on node 0 Wed Apr 27 05:16:18 2016

I am guessing there are some issues with the PME calculations and the
number of nodes used for a smaller system like mine. In that case what
would be the correct combination for options such as -npme or -nt? Should I
use the -dlb option? I would love to hear fro the experts.

Thanks in advance

Soumadwip Ghosh
Senior Research Scholar
IITB
India