[gmx-users] issue in replica exchange

Floris Buelens floris_buelens at yahoo.com
Tue May 7 16:03:20 CEST 2013


These symptoms sound a lot like a bug I reported back in 2010: http://redmine.gromacs.org/issues/433 - closed with only a short comment from Berk that it was fixed for 4.5. With my fix as detailed in the redmine report I've been happily doing NPT replica exchange with my custom code forked from 4.0.5 - I've never tested any later versions.

The key point there was that the problem only occurred with one CPU per replica - I'm afraid I haven't gone through this whole thread, have you tested both single and multiple CPUs per replica?






----- Original Message -----
From: Mark Abraham <mark.j.abraham at gmail.com>
To: Discussion list for GROMACS users <gmx-users at gromacs.org>
Cc: 
Sent: Thursday, 2 May 2013, 11:53
Subject: Re: [gmx-users] issue in replica exchange

On Wed, May 1, 2013 at 10:24 PM, XAvier Periole <x.periole at rug.nl> wrote:

>
> Ok here is my current status on that REMD issue.
>
> For info: I use
> Temperature: v-rescale, tau_t = 2.0 ps
> Pressure: berendsen, tau_p = 5.0 ps,
> time step: dt=0.002 - 0.020 fs,
> COM removal on for bilayer/water separately
>
> The symptoms: explosion of the system after 2-5 steps following the swap,
> first sign is a huge jump in LJ interactions and pressure. This jump seems
> to be absorbed by the box size and temperature when possible … see example
> I provided earlier. A large VCM (velocity centre of mass?) is often
> associated with the crash. But also pressure scaling more than 1% ...
>
> 1- the problem mentioned above remains in gmx-4.5.7 and it might actually
> got worse. I was able to run a 500 ns simulation with gmx405 using similar
> setup as for gmx457. The following point happened in gmx457.
> 2- it persists with a time step of 2 fs. Actually all tests performed in
> the following used dt=2fs.
> 3- if I perform an exchange that explodes within mdrun myself (externally
> to the remd gromacs by getting the gro file with the mdp adjusting the
> temperature) it goes all fine.
> 4- the issue gets much worst when the consecutive replicas differ
> (different protein conformations and the box size etc) … explosion at first
> exchange.
> 5- the use of parrinelo-raman does not help
> 6- cancelling the centre of mass removal does not remove the problem.
> 7- switching to NVT ensemble does not help but makes it worst (crash in 2
> steps). All exchanges accepted at first attempt crash with the message
> "Large VCM(group SOL): -0.0XXX , -0.XXX, -0.16XXX, Temp-cm:6.55XXX
> 8- using a unique conformation (the same) for all replicas in the NVT REMD
> simulation after equilibration in the same NVT ensemble (for 1 ns) removes
> the problem.
> 9- taking the equilibrated NVT conformations, equilibrate them in an NPT
> ensemble (1 ns) and let go the exchanges afterwards restores the problem …
> one exchange is not properly done at the second trial, while the first ones
> were fine. Well if errors were made that was with reasonable
> 10- note also that the coarse grain I use is extremely forgiving, meaning
> you can perform really nasty transformations and run it further after
> simple minimisation … so even abrupt changes in temperatures should be fine
> and relax quickly.
> 11- when looking at the conformations themselves nothing appears to have
> jumped over or nothing funky.
>
> At this point I am not sure what to think and what to do next. There is
> definitely something not going right during the exchanges.
>

OK, thanks for the effort. That all agrees with my suspicion that the full
state is not being exchanged.

Anyone has been able to run a REMD simulation in an NPT ensemble without
> crashes? I would imagine someone has and something particular to my system
> is making it going wrong but I am really wondering what it could be. My
> feeling is that something relative to the box size or pressure is not going
> across but it might be something completely different, when the consecutive
> systems differ reasonably.
>

I've never tried, but an experiment with a water box might be instructive.

However that would suggest that the manner the exchanges are made is
> severely wrong in some cases.
>
> Any help to resolve the problem would be greatly appreciated.
>

There is an outstanding REMD issue on redmine that could be related (
http://redmine.gromacs.org/issues/1191). I'd suggest you open a new issue
there, upload a minimal set of .tprs that can reproduce the problem and
anything you can think of that might help investigate. For something I'm
doing, I'd like to be sure the full T-coupling state is being exchanged,
and we may as well kill all the bugs at once.

Mark


XAvier.
>
> On Apr 26, 2013, at 9:21 AM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> > On Thu, Apr 25, 2013 at 11:05 PM, XAvier Periole <x.periole at rug.nl>
> wrote:
> >
> >>
> >> Thanks for the answer. I'll check gmx4.5.7 and report back.
> >>
> >> I am not sure what you mean by GROMACS swaps the coordinates not the
> >> ensemble data. The coupling to P and T and not exchanged with it?
> >
> >
> > The code in src/kernel/repl_ex.c:
> >
> > static void exchange_state(const gmx_multisim_t *ms, int b, t_state
> *state)
> > {
> >    /* When t_state changes, this code should be updated. */
> >    int ngtc, nnhpres;
> >    ngtc    = state->ngtc * state->nhchainlength;
> >    nnhpres = state->nnhpres* state->nhchainlength;
> >    exchange_rvecs(ms, b, state->box, DIM);
> >    exchange_rvecs(ms, b, state->box_rel, DIM);
> >    exchange_rvecs(ms, b, state->boxv, DIM);
> >    exchange_reals(ms, b, &(state->veta), 1);
> >    exchange_reals(ms, b, &(state->vol0), 1);
> >    exchange_rvecs(ms, b, state->svir_prev, DIM);
> >    exchange_rvecs(ms, b, state->fvir_prev, DIM);
> >    exchange_rvecs(ms, b, state->pres_prev, DIM);
> >    exchange_doubles(ms, b, state->nosehoover_xi, ngtc);
> >    exchange_doubles(ms, b, state->nosehoover_vxi, ngtc);
> >    exchange_doubles(ms, b, state->nhpres_xi, nnhpres);
> >    exchange_doubles(ms, b, state->nhpres_vxi, nnhpres);
> >    exchange_doubles(ms, b, state->therm_integral, state->ngtc);
> >    exchange_rvecs(ms, b, state->x, state->natoms);
> >    exchange_rvecs(ms, b, state->v, state->natoms);
> >    exchange_rvecs(ms, b, state->sd_X, state->natoms);
> > }
> >
> > I mis-stated last night - there *is* exchange of ensemble data, but it is
> > incomplete. In particular, state->ekinstate is not exchanged. Probably it
> > is incomplete because the 9-year-old comment about t_state changing is
> in a
> > location that nobody changing t_state will see. And serializing a
> complex C
> > data structure over MPI is tedious at best. But that is not really an
> > excuse for the non-modularity GROMACS has for many of its key data
> > structures. We are working on various workflow and actual code structure
> > improvements to fix/prevent issues like this, but the proliferation of
> > algorithms that ought to be inter-operable makes the job pretty hard.
> >
> > Other codes seem to exchange the ensemble label data (e.g. reference
> > temperatures for T-coupling) because they write trajectories that are
> > continuous with respect to atomic coordinates. I plan to move REMD in
> > GROMACS to this approach, because it scales better, but it will not
> happen
> > any time soon.
> >
> > That would explain what I see, but let see what 4.5.7 has to say first.
> >>
> >
> > Great. It may be that there were other issues in 4.5.3 that exacerbated
> any
> > REMD problem.
> >
> > Mark
> >
> > Tks.
> >>
> >> On Apr 25, 2013, at 22:40, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
> >>
> >>> Thanks for the good report. There have been some known issues about the
> >>> timing of coupling stages with respect to various intervals between
> >> GROMACS
> >>> events for some algorithms. There are a lot of fixed problems in 4.5.7
> >> that
> >>> are not specific to REMD, but I have a few lingering doubts about
> whether
> >>> we should be exchanging (scaled) coupling values along with the
> >>> coordinates. (Unlike most REMD implementations, GROMACS swaps the
> >>> coordinates, not the ensemble data.) If you can reproduce those kinds
> of
> >>> symptoms in 4.5.7 (whether or not they then crash) then there looks
> like
> >>> there may be a problem with the REMD implementation that is perhaps
> >> evident
> >>> only with the kind of large time step Martini takes?
> >>>
> >>> Mark
> >>>
> >>>
> >>> On Thu, Apr 25, 2013 at 1:28 PM, XAvier Periole <x.periole at rug.nl>
> >> wrote:
> >>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> I have been recently using the REMD code in gmx-407 and gmx-453 and
> got
> >> a
> >>>> few systems crashing for unclear reasons so far. The main tests I made
> >> are
> >>>> using gmx407 but it is all reproducible with gmx453. The crashing was
> >> also
> >>>> reproduced (not necessarily at the same time point) on several
> >>>> architectures.
> >>>>
> >>>> The system is made of a pair of proteins in a membrane patch and for
> >> which
> >>>> the relative orientation is controlled by non-native
> >> bond/angles/dihedrals
> >>>> to perform an umbrella sampling. I use the MARTINI force field but
> that
> >>>> might not be relevant here.
> >>>>
> >>>> The crashes occur following exchanges that do not seem to occur the
> >>>> correct way and preceded by pressure scaling warnings … indicative of
> a
> >>>> strong destabilisation of the system and eventual explosion. Some
> >>>> information seems to be exchanged inaccurately.
> >>>>
> >>>> Trying to nail down the problem I got stuck and may be some one can
> >> help.
> >>>> I placed a pdf file showing plots of bonded/nonbonded energies,
> >>>> temperatures, box size etc … around an exchange that does not lead to
> a
> >>>> crash (here: md.chem.rug.nl/~periole/remd-issue.pdf). I plotted stuff
> >>>> every step with the temperature colour coded as indicated in the first
> >>>> figure.
> >>>>
> >>>> From the figure it appears that the step right after the exchange
> there
> >> is
> >>>> a huge jump of Potential energy coming from the LJ(SR) part of it.
> >> Although
> >>>> there are some small discontinuities in the progression of the bond
> and
> >>>> angle energy around the exchange they seem to fine. The temperature
> and
> >> box
> >>>> size seem to respond to it a few step latter while the pressure seems
> >> to be
> >>>> affected right away but potentially as the Epot will affect the viral
> >> and
> >>>> thus the Pressure.
> >>>>
> >>>> The other potential clue is that the jumps reduce with the strength of
> >> the
> >>>> pressure coupling. A 1/2 ps tau_p (Berendsen) will lead to a crash
> >> while a
> >>>> 5/10/20 ps won't. Inspection of the time evolution of the Epot, box …
> >>>> indicates that the magnitude of the jumps is reduced and the system ca
> >>>> handle the problem.
> >>>>
> >>>> One additional info since I first posted the problem (delayed by the
> >> file
> >>>> first attached but now given with a link) the problem is accentuated
> >> when
> >>>> the replicas differ in conformation. I am looking at the actual
> >> differences
> >>>> as you'll read this email.
> >>>>
> >>>> That is as far as I could go. Any suggestion is welcome.
> >>>>
> >>>> XAvier.
> >>>> MD-Group / Univ. of Groningen
> >>>> The Netherlands--
> >>>> gmx-users mailing list    gmx-users at gromacs.org
> >>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >>>> * Please search the archive at
> >>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >>>> * Please don't post (un)subscribe requests to the list. Use the
> >>>> www interface or send it to gmx-users-request at gromacs.org.
> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>> --
> >>> gmx-users mailing list    gmx-users at gromacs.org
> >>> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >>> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >>> * Please don't post (un)subscribe requests to the list. Use the
> >>> www interface or send it to gmx-users-request at gromacs.org.
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> --
> >> gmx-users mailing list    gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> * Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
--
gmx-users mailing list    gmx-users at gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-request at gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists 



More information about the gromacs.org_gmx-users mailing list