[gmx-users] System volume "jumps" on exact continuations
ploetz at ksu.edu
Wed May 31 16:56:36 CEST 2017
We are observing abrupt, discontinuous “jumps” in our simulation box volume for different chunks of trajectory when performing exact continuations of standard, explicit solvent, MD simulations in the NPT ensemble. Note that these volume jumps do not appear to occur after every single continuation (there seems to be some element of randomness); however, they occur at some point in almost all of our different simulations. There are no visually apparent problems with the trajectories upon viewing them in e.g., Pymol. We are not able to visualize anything unusual, such as “bubbles” for cases where the system volume jumps to larger values.
Here is what we have tested:
* Temperature/Pressure Coupling Methods: Nosé-Hoover/Parrinello-Rahman and Berendsen/Berendsen are both affected
* Gromacs Versions: 4.6, 4.6.1, 5.0, and 2016 are all affected
* Machines/Architectures: three different clusters (we don’t know of a machine where it does not occur) are all affected:
* the lab’s cluster where we installed Gromacs ourselves
* the university cluster where Gromacs was installed by the university IT staff
* the Hansen cluster at Purdue, which we access only through a GUI at the DiaGrid portal
* Systems: Pure water systems as well as solvated single solute systems where the solute is a single protein, LJ sphere, or buckminsterfullerene are all affected
* Force fields: versions of AMBER, CHARMM, GROMOS, and OPLS force fields are all affected
* Box shape: cubic boxes and rhombic dodecahedrons are both affected
* Continuation method: Restarts of killed simulations (killed on purpose to increase the numbers of nodes as resources became available -- we don't do this anymore, but this was how we first saw the problem) and exact continuations of completed simulations are both affected
* Number of Nodes: We were almost convinced that it happened whenever we ran on >1 node (our usual situation), but not if we ran on 1 node only. We did some tests on one node on the lab’s cluster with and without MPI, to see whether or not it was MPI. System volume jumps were not (yet?) observed regardless of whether or not MPI was used. However, on the university cluster, we did tests of 24 processors using "-pe mpi-fill 24". Usually the 24 processors were spread across >1 node, but sometimes they completely filled one node. There was one instance where there was a system volume jump when the 24 processors changed from being distributed across >1 node to being on 1 node only. However, MPI was used in all of those simulations. So, we still have not proved that it is not MPI that is a problem. Unfortunately, this test result is still murky. Perhaps we should not have even mentioned it.
* Cut-off Scheme: We have done significantly fewer simulations with the Verlet cut-off scheme; however, so far the system volume jumps have not occurred with Verlet. We are continuing these tests.
Here is how we do exact continuations:
gmx_mpi convert-tpr -s previous.tpr -extend 20000 -o next.tpr
mpiexec gmx_mpi mdrun -v -deffnm next -cpi previous.cpt -noappend >& myjob
Here<https://docs.google.com/document/d/1NUi3H7agEFMeEq5qxTNXWopaFXqIjUbHpLUqifVRVys/edit?usp=sharing> is an example (CHARMM22*-based) MDP file (only used for the initial production run).
We have performed the Gromacs regression tests (with one processor, >1 processor but only 1 node, and with more processors than will fit on one node) on the two machines that we have command line access to (lab and university clusters). Some of the regression tests reset the number of processors to 8 and thus are running on only one node. But, for what it is worth, all tests passed.
A linked Photobucket image<http://i1243.photobucket.com/albums/gg545/ploetz/box-volume_zps0foiwzs9.png> shows the same system ran for five 1-ns chunks with exact continuations on 2 machines: the university's or the lab's cluster. In Row 1, note that a particular system box volume is obtained on the university cluster with 1 node and MPI. No system volume jumps are noted. However, the average volume is not the same when ran on the lab’s cluster with 3 nodes, and a system volume jump occurs at 2ns (2nd Row). The system volume does at least roughly match that of the university cluster when ran on the lab's cluster with 1 node, both without (3rd Row) or with (4th Row) MPI.
Some may view these as small volume jumps, but they correspond to the volume of many water molecules (1 water has a volume of ~0.03 nm3). Often the volume jumps are larger than those shown in the linked figure (e.g., ~7 nm3).
When a jump occurs, the system stays at that new volume; it is not a "blip" that then re-adjusts to the original volume. These volume jumps seem suggestive of changes to the simulation settings or FF parameters occurring during the continuations.
We would greatly appreciate any advice. Uncertainty of the system volume prevents us from trusting any subsequent analysis.
More information about the gromacs.org_gmx-users