[gmx-users] Re: Persistent crashes due to unsettled waters

Mon Nov 3 00:48:28 CET 2008

Aaron Fafarman wrote:
> Thanks for the tip about the "blowing-up" page on the wiki--I already
> checked there and found little help because my crashes are coming so
> long into the runs (sometimes as late as 1 million steps in). This is
> also why I can't debug with nstxout = 1: the output files would get
> too big. I'm also reluctant to lower the time-step from 2 fs to say 1
> fs, because I'm already resource-limited in these simulations, and
> slowing them down would halve the amount of sampling I could do.

If you structure your calculation in batches of a few hundred 
picoseconds or so, then when something goes wrong, you've got a restart 
close beforehand. You can tweak that to get your debugging info for the 
part that failed. GROMACS 4.0's checkpointing facility might let you 
avoid doing lots of work here, but I haven't played with it yet.

Coupled with the above, nstxtcout = 1 with xtcgroups on the whole system 
would be less resource-demanding. You could decide to throw away your 
"debugging" xtc after each successful batch, possibly after a trjconv to 
extract the bits of information you normally wanted to keep in your xtc 
files.

> Any other thoughts on this?
> 
> The mdp file for these crashing runs is shown below:
> 
> integrator      = md
> nsteps          = 2000000
> dt              = 0.002
> nstlist         = 10
> nstcomm         = 1
> rlist           = 1.0
> coulombtype     = pme
> rcoulomb        = 1.0
> vdw-type        = cut-off
> rvdw            = 1.0
> tcoupl          = Nose-Hoover
> tc-grps         = protein non-protein
> tau-t           = 0.5 0.5
> ref-t           = 298 298
> nstxout         = 1000
> nstxtcout       = 100
> nstenergy       = 100

Seems fine.

>> Message: 7
>> Date: Fri, 31 Oct 2008 20:43:22 -0400
>> From: "Justin A. Lemkul" <jalemkul at vt.edu>
>> Subject: Re: [gmx-users] Persistent crashes due to unsettled waters
>> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>> Message-ID: <490BA62A.8000007 at vt.edu>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>>
>>
>> Aaron Fafarman wrote:
>>> Dear GMX-community,
>>>
>>> I'm running four related, equilibrium-dynamics simulations of a small
>>> protein in explicit water, each with a different non-natural
>>> amino-acid in the sequence. One of the four non-natural amino-acid
>>> containing proteins can be simulated for long times, up to 4 ns
>>> without incident, but the other three invariably crash. On repeated
>>> reinitialization (changing the box size, changing the duration of
>>> position restrained refinement) they still keep crashing at any time
>>> between 100's of picoseconds and 2 ns. Usually an error such as
>>> "3009.270 ps: Water molecule starting at atom 26883 can not be
>>> settled." is printed to the log file and the pdb of the step after the
>>> crash has nonsense numbers for the coordinates of the offending water
>>> molecule. It's always a different water molecule, always a
>>> high-numbered atom (>20000). The crashes happen on both a
>>> self-compiled linux build and on a self-compiled Mac OS 10.4 build,
>>> both GMX 3.3.1.  A simulation of just the parameterized amino acids in
>>> a box of water never crashes over the course of a 4 ns simulation so I
>>> don't suspect my amino-acid parameters.

One consistent hypothesis that isn't tested above is that you've got 
some kind of erroneous long-range interaction that might only be 
manifest when a certain pair of protein atoms (or atom types) get inside 
a cut-off. How such a thing could arise might depend on how the force 
field does the parameter lookups. Imagine an exponent sign error on an 
LJ parameter on a pair of atom types unlikely to come into LJ contact... 
The nstxout/nstxtcout approach Justin and I suggest might allow you to 
see that "kick" happen before the nearby atoms get kicked again on the 
next time-step.

Mark