[gmx-users] System volume "jumps" on exact continuations

Thu Jun 1 22:32:55 CEST 2017

> Hi,
>
> Thanks for the report.
>
> On Wed, May 31, 2017 at 4:56 PM Elizabeth Ploetz <ploetz at ksu.edu> wrote:
>
>> Dear GMX-USERS,
>>
>> We are observing abrupt, discontinuous “jumps” in our simulation box
>> volume for different chunks of trajectory when performing exact
>> continuations of standard, explicit solvent, MD simulations in the NPT
>> ensemble. Note that these volume jumps do not appear to occur after every
>> single continuation (there seems to be some element of randomness);
>> however, they occur at some point in almost all of our different
>> simulations. There are no visually apparent problems with the trajectories
>> upon viewing them in e.g., Pymol. We are not able to visualize anything
>> unusual, such as “bubbles” for cases where the system volume jumps to
>> larger values.
>>
>> Here is what we have tested:
>>
>>
>>   *   Temperature/Pressure Coupling Methods: Nosé-Hoover/Parrinello-Rahman
>> and Berendsen/Berendsen are both affected
>>   *   Gromacs Versions: 4.6, 4.6.1, 5.0, and 2016 are all affected
>>   *   Machines/Architectures: three different clusters (we don’t know of a
>> machine where it does not occur) are all affected:
>>      *   the lab’s cluster where we installed Gromacs ourselves
>>      *   the university cluster where Gromacs was installed by the
>> university IT staff
>>      *   the Hansen cluster at Purdue, which we access only through a GUI
>> at the DiaGrid portal
>>   *   Systems: Pure water systems as well as solvated single solute
>> systems where the solute is a single protein, LJ sphere, or
>> buckminsterfullerene are all affected
>>   *   Force fields: versions of AMBER, CHARMM, GROMOS, and OPLS force
>> fields are all affected
>>   *   Box shape: cubic boxes and rhombic dodecahedrons are both affected
>>   *   Continuation method: Restarts of killed simulations (killed on
>> purpose to increase the numbers of nodes as resources became available --
>> we don't do this anymore, but this was how we first saw the problem) and
>> exact continuations of completed simulations are both affected
>>   *   Number of Nodes: We were almost convinced that it happened whenever
>> we ran on >1 node (our usual situation), but not if we ran on 1 node only.
>> We did some tests on one node on the lab’s cluster with and without MPI, to
>> see whether or not it was MPI. System volume jumps were not (yet?) observed
>> regardless of whether or not MPI was used. However, on the university
>> cluster, we did tests of 24 processors using "-pe mpi-fill 24". Usually the
>> 24 processors were spread across >1 node, but sometimes they completely
>> filled one node. There was one instance where there was a system volume
>> jump when the 24 processors changed from being distributed across >1 node
>> to being on 1 node only. However, MPI was used in all of those simulations.
>> So, we still have not proved that it is not MPI that is a problem.
>> Unfortunately, this test result is still murky. Perhaps we should not have
>> even mentioned it.
>>   *   Cut-off Scheme: We have done significantly fewer simulations with
>> the Verlet cut-off scheme; however, so far the system volume jumps have not
>> occurred with Verlet. We are continuing these tests.
>>
>
> The simplest explanation is that your system has a phase transition that
> happens eventually :-) We don't know what's in your system, so it's hard to
> suggest what is going wrong.

As noted in the original post, it happens for pure water simulations, as well as single solutes solvated in water. The solutes can be anything from a Lennard-Jones sphere to buckminsterfullerene to a protein. There are no phase transitions or anything weird  that we can observe, as noted above.

>> Here is how we do exact continuations:
>> gmx_mpi convert-tpr -s previous.tpr -extend 20000 -o next.tpr
>> mpiexec gmx_mpi mdrun -v -deffnm next -cpi previous.cpt -noappend >& myjob
>>

>Fine

>> Here<
>>  https://docs.google.com/document/d/1NUi3H7agEFMeEq5qxTNXWopaFXqIjUbHpLUqifVRVys/edit?usp=sharing>
>> is an example (CHARMM22*-based) MDP file (only used for the initial
>> production run).
>>

> That is not a correct way to model with CHARMM (in particular those cutoffs
> and cutoff treatments). This was tested extensively by CHARMM force field
> developers, and their recommendations are at
> http://www.gromacs.org/Documentation/Terminology/Force_Fields/CHARMM.

These values were specifically chosen to match the DE Shaw CHARMM22* force field, according to  the following reference:
S. Piana, K. Lindorff-Larsen, D.E. Shaw
Atomic-level description of ubiquitin folding
Proc. Natl. Acad. Sci. U. S. A., 110 (15) (2013), pp. 5915–5920
Page 5916: "Nonbonded interactions were truncated to 10.5 Å..."
As noted in my original message, we have observed the same jumps with other force fields and we try to implement the force field as the force fields' authors did.

> You're also generating velocities at the start of your production run,
> which is not a great idea (but I don't think it is related to your
> problem). Do an exact restart from the end of your equlibration, e.g. gmx
> grompp ... -t equil_state.cpt -o production. Your LINCS settings are
> unnecessarily strict - use the recommendations in the mdp documentation.

Sorry, I know it is correct not to generate velocities. We have tested both with and without generating velocities. It does not seem to affect this. When going from an equilibration to a production (different T/p coupling schemes) run, I use
gmx grompp -f new.mdp -c old.tpr -o new.tpr -t old.cpt
gmx mdrun -s new.tpr [do not pass checkpoint]

>> We have performed the Gromacs regression tests (with one processor, >1
>> processor but only 1 node, and with more processors than will fit on one
>> node) on the two machines that we have command line access to (lab and
>> university clusters). Some of the regression tests reset the number of
>> processors to 8 and thus are running on only one node. But, for what it is
>> worth, all tests passed.
>>
>> A linked Photobucket image<
>>  http://i1243.photobucket.com/albums/gg545/ploetz/box-volume_zps0foiwzs9.pnghttp://i1243.photobucket.com/albums/gg545/ploetz/box-volume_zps0foiwzs9.png
>>
>> shows the same system ran for five 1-ns chunks with exact continuations on
>> 2 machines: the university's or the lab's cluster. In Row 1, note that a
>> particular system box volume is obtained on the university cluster with 1
>> node and MPI. No system volume jumps are noted. However, the average volume
>> is not the same when ran on the lab’s cluster with 3 nodes, and a system
>> volume jump occurs at 2ns (2nd Row). The system volume does at least
>> roughly match that of the university cluster when ran on the lab's cluster
>> with 1 node, both without (3rd Row) or with (4th Row) MPI.
>>

> Note that the second simulation already had a different volume before it
> started. That suggests it is already a different system (compare the tpr
> files with gmx check -s1 -s2 to see any silly errors), or has already had a
> phase transition during your equilibration.

The exact same TPR file was used for all of the runs starting at 0 ns.

>> Some may view these as small volume jumps, but they correspond to the
>> volume of many water molecules (1 water has a volume of ~0.03 nm3). Often
>> the volume jumps are larger than those shown in the linked figure (e.g., ~7
>> nm3).
>>
>> When a jump occurs, the system stays at that new volume; it is not a
>> "blip" that then re-adjusts to the original volume. These volume jumps seem
>> suggestive of changes to the simulation settings or FF parameters occurring
>> during the continuations.
>>

> Or that e.g. you have a membrane that goes in or out of a gel-like state
> under the conditions you are using. Or look at RMSD or radius of gyration
> time series of your protein... it's hard to help when we don't know what
> your system has in it.

We are not running membrane systems, only pure water or pure water with a single solute. While our protein systems of course have slight conformational changes, the volume changes for protein unfolding are both experimentally and in simulations on the order  of only a few water molecules. The volume jumps we are observing here are on the order of hundreds of waters, and occur discontinuously and specifically at the breaks in trajectories. It is some sort of artifact from breaking the simulations into multiple  chunks instead of having one long run.

>> We would greatly appreciate any advice. Uncertainty of the system volume
>> prevents us from trusting any subsequent analysis.
>>

> It looks intrinsic to what you are doing, but perhaps could be an artefact
> of choosing arbitrary non-bonded settings for CHARMM.

I am curious if no one else sees this in standard protein simulations or pure water simulations?

> Mark

>> Thanks,
>>
>> Elizabeth