[gmx-users] issue in replica exchange

XAvier Periole x.periole at rug.nl
Thu May 2 12:58:00 CEST 2013


I saw that redmine report, which could be related but it seems to happen only for runs done outside the domain and particle decompositions.

I'll fill up a red mine. 

Anything I could do to help speeding the fix? 

On May 2, 2013, at 11:53 AM, Mark Abraham <mark.j.abraham at gmail.com> wrote:

> On Wed, May 1, 2013 at 10:24 PM, XAvier Periole <x.periole at rug.nl> wrote:
> 
>> 
>> Ok here is my current status on that REMD issue.
>> 
>> For info: I use
>> Temperature: v-rescale, tau_t = 2.0 ps
>> Pressure: berendsen, tau_p = 5.0 ps,
>> time step: dt=0.002 - 0.020 fs,
>> COM removal on for bilayer/water separately
>> 
>> The symptoms: explosion of the system after 2-5 steps following the swap,
>> first sign is a huge jump in LJ interactions and pressure. This jump seems
>> to be absorbed by the box size and temperature when possible … see example
>> I provided earlier. A large VCM (velocity centre of mass?) is often
>> associated with the crash. But also pressure scaling more than 1% ...
>> 
>> 1- the problem mentioned above remains in gmx-4.5.7 and it might actually
>> got worse. I was able to run a 500 ns simulation with gmx405 using similar
>> setup as for gmx457. The following point happened in gmx457.
>> 2- it persists with a time step of 2 fs. Actually all tests performed in
>> the following used dt=2fs.
>> 3- if I perform an exchange that explodes within mdrun myself (externally
>> to the remd gromacs by getting the gro file with the mdp adjusting the
>> temperature) it goes all fine.
>> 4- the issue gets much worst when the consecutive replicas differ
>> (different protein conformations and the box size etc) … explosion at first
>> exchange.
>> 5- the use of parrinelo-raman does not help
>> 6- cancelling the centre of mass removal does not remove the problem.
>> 7- switching to NVT ensemble does not help but makes it worst (crash in 2
>> steps). All exchanges accepted at first attempt crash with the message
>> "Large VCM(group SOL): -0.0XXX , -0.XXX, -0.16XXX, Temp-cm:6.55XXX
>> 8- using a unique conformation (the same) for all replicas in the NVT REMD
>> simulation after equilibration in the same NVT ensemble (for 1 ns) removes
>> the problem.
>> 9- taking the equilibrated NVT conformations, equilibrate them in an NPT
>> ensemble (1 ns) and let go the exchanges afterwards restores the problem …
>> one exchange is not properly done at the second trial, while the first ones
>> were fine. Well if errors were made that was with reasonable
>> 10- note also that the coarse grain I use is extremely forgiving, meaning
>> you can perform really nasty transformations and run it further after
>> simple minimisation … so even abrupt changes in temperatures should be fine
>> and relax quickly.
>> 11- when looking at the conformations themselves nothing appears to have
>> jumped over or nothing funky.
>> 
>> At this point I am not sure what to think and what to do next. There is
>> definitely something not going right during the exchanges.
>> 
> 
> OK, thanks for the effort. That all agrees with my suspicion that the full
> state is not being exchanged.
> 
> Anyone has been able to run a REMD simulation in an NPT ensemble without
>> crashes? I would imagine someone has and something particular to my system
>> is making it going wrong but I am really wondering what it could be. My
>> feeling is that something relative to the box size or pressure is not going
>> across but it might be something completely different, when the consecutive
>> systems differ reasonably.
>> 
> 
> I've never tried, but an experiment with a water box might be instructive.
> 
> However that would suggest that the manner the exchanges are made is
>> severely wrong in some cases.
>> 
>> Any help to resolve the problem would be greatly appreciated.
>> 
> 
> There is an outstanding REMD issue on redmine that could be related (
> http://redmine.gromacs.org/issues/1191). I'd suggest you open a new issue
> there, upload a minimal set of .tprs that can reproduce the problem and
> anything you can think of that might help investigate. For something I'm
> doing, I'd like to be sure the full T-coupling state is being exchanged,
> and we may as well kill all the bugs at once.
> 
> Mark
> 
> 
> XAvier.
>> 
>> On Apr 26, 2013, at 9:21 AM, Mark Abraham <mark.j.abraham at gmail.com>
>> wrote:
>> 
>>> On Thu, Apr 25, 2013 at 11:05 PM, XAvier Periole <x.periole at rug.nl>
>> wrote:
>>> 
>>>> 
>>>> Thanks for the answer. I'll check gmx4.5.7 and report back.
>>>> 
>>>> I am not sure what you mean by GROMACS swaps the coordinates not the
>>>> ensemble data. The coupling to P and T and not exchanged with it?
>>> 
>>> 
>>> The code in src/kernel/repl_ex.c:
>>> 
>>> static void exchange_state(const gmx_multisim_t *ms, int b, t_state
>> *state)
>>> {
>>>   /* When t_state changes, this code should be updated. */
>>>   int ngtc, nnhpres;
>>>   ngtc    = state->ngtc * state->nhchainlength;
>>>   nnhpres = state->nnhpres* state->nhchainlength;
>>>   exchange_rvecs(ms, b, state->box, DIM);
>>>   exchange_rvecs(ms, b, state->box_rel, DIM);
>>>   exchange_rvecs(ms, b, state->boxv, DIM);
>>>   exchange_reals(ms, b, &(state->veta), 1);
>>>   exchange_reals(ms, b, &(state->vol0), 1);
>>>   exchange_rvecs(ms, b, state->svir_prev, DIM);
>>>   exchange_rvecs(ms, b, state->fvir_prev, DIM);
>>>   exchange_rvecs(ms, b, state->pres_prev, DIM);
>>>   exchange_doubles(ms, b, state->nosehoover_xi, ngtc);
>>>   exchange_doubles(ms, b, state->nosehoover_vxi, ngtc);
>>>   exchange_doubles(ms, b, state->nhpres_xi, nnhpres);
>>>   exchange_doubles(ms, b, state->nhpres_vxi, nnhpres);
>>>   exchange_doubles(ms, b, state->therm_integral, state->ngtc);
>>>   exchange_rvecs(ms, b, state->x, state->natoms);
>>>   exchange_rvecs(ms, b, state->v, state->natoms);
>>>   exchange_rvecs(ms, b, state->sd_X, state->natoms);
>>> }
>>> 
>>> I mis-stated last night - there *is* exchange of ensemble data, but it is
>>> incomplete. In particular, state->ekinstate is not exchanged. Probably it
>>> is incomplete because the 9-year-old comment about t_state changing is
>> in a
>>> location that nobody changing t_state will see. And serializing a
>> complex C
>>> data structure over MPI is tedious at best. But that is not really an
>>> excuse for the non-modularity GROMACS has for many of its key data
>>> structures. We are working on various workflow and actual code structure
>>> improvements to fix/prevent issues like this, but the proliferation of
>>> algorithms that ought to be inter-operable makes the job pretty hard.
>>> 
>>> Other codes seem to exchange the ensemble label data (e.g. reference
>>> temperatures for T-coupling) because they write trajectories that are
>>> continuous with respect to atomic coordinates. I plan to move REMD in
>>> GROMACS to this approach, because it scales better, but it will not
>> happen
>>> any time soon.
>>> 
>>> That would explain what I see, but let see what 4.5.7 has to say first.
>>>> 
>>> 
>>> Great. It may be that there were other issues in 4.5.3 that exacerbated
>> any
>>> REMD problem.
>>> 
>>> Mark
>>> 
>>> Tks.
>>>> 
>>>> On Apr 25, 2013, at 22:40, Mark Abraham <mark.j.abraham at gmail.com>
>> wrote:
>>>> 
>>>>> Thanks for the good report. There have been some known issues about the
>>>>> timing of coupling stages with respect to various intervals between
>>>> GROMACS
>>>>> events for some algorithms. There are a lot of fixed problems in 4.5.7
>>>> that
>>>>> are not specific to REMD, but I have a few lingering doubts about
>> whether
>>>>> we should be exchanging (scaled) coupling values along with the
>>>>> coordinates. (Unlike most REMD implementations, GROMACS swaps the
>>>>> coordinates, not the ensemble data.) If you can reproduce those kinds
>> of
>>>>> symptoms in 4.5.7 (whether or not they then crash) then there looks
>> like
>>>>> there may be a problem with the REMD implementation that is perhaps
>>>> evident
>>>>> only with the kind of large time step Martini takes?
>>>>> 
>>>>> Mark
>>>>> 
>>>>> 
>>>>> On Thu, Apr 25, 2013 at 1:28 PM, XAvier Periole <x.periole at rug.nl>
>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I have been recently using the REMD code in gmx-407 and gmx-453 and
>> got
>>>> a
>>>>>> few systems crashing for unclear reasons so far. The main tests I made
>>>> are
>>>>>> using gmx407 but it is all reproducible with gmx453. The crashing was
>>>> also
>>>>>> reproduced (not necessarily at the same time point) on several
>>>>>> architectures.
>>>>>> 
>>>>>> The system is made of a pair of proteins in a membrane patch and for
>>>> which
>>>>>> the relative orientation is controlled by non-native
>>>> bond/angles/dihedrals
>>>>>> to perform an umbrella sampling. I use the MARTINI force field but
>> that
>>>>>> might not be relevant here.
>>>>>> 
>>>>>> The crashes occur following exchanges that do not seem to occur the
>>>>>> correct way and preceded by pressure scaling warnings … indicative of
>> a
>>>>>> strong destabilisation of the system and eventual explosion. Some
>>>>>> information seems to be exchanged inaccurately.
>>>>>> 
>>>>>> Trying to nail down the problem I got stuck and may be some one can
>>>> help.
>>>>>> I placed a pdf file showing plots of bonded/nonbonded energies,
>>>>>> temperatures, box size etc … around an exchange that does not lead to
>> a
>>>>>> crash (here: md.chem.rug.nl/~periole/remd-issue.pdf). I plotted stuff
>>>>>> every step with the temperature colour coded as indicated in the first
>>>>>> figure.
>>>>>> 
>>>>>> From the figure it appears that the step right after the exchange
>> there
>>>> is
>>>>>> a huge jump of Potential energy coming from the LJ(SR) part of it.
>>>> Although
>>>>>> there are some small discontinuities in the progression of the bond
>> and
>>>>>> angle energy around the exchange they seem to fine. The temperature
>> and
>>>> box
>>>>>> size seem to respond to it a few step latter while the pressure seems
>>>> to be
>>>>>> affected right away but potentially as the Epot will affect the viral
>>>> and
>>>>>> thus the Pressure.
>>>>>> 
>>>>>> The other potential clue is that the jumps reduce with the strength of
>>>> the
>>>>>> pressure coupling. A 1/2 ps tau_p (Berendsen) will lead to a crash
>>>> while a
>>>>>> 5/10/20 ps won't. Inspection of the time evolution of the Epot, box …
>>>>>> indicates that the magnitude of the jumps is reduced and the system ca
>>>>>> handle the problem.
>>>>>> 
>>>>>> One additional info since I first posted the problem (delayed by the
>>>> file
>>>>>> first attached but now given with a link) the problem is accentuated
>>>> when
>>>>>> the replicas differ in conformation. I am looking at the actual
>>>> differences
>>>>>> as you'll read this email.
>>>>>> 
>>>>>> That is as far as I could go. Any suggestion is welcome.
>>>>>> 
>>>>>> XAvier.
>>>>>> MD-Group / Univ. of Groningen
>>>>>> The Netherlands--
>>>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>>>>> * Please don't post (un)subscribe requests to the list. Use the
>>>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>> --
>>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>>>> * Please don't post (un)subscribe requests to the list. Use the
>>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>> --
>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>>> * Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>> 
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists




More information about the gromacs.org_gmx-users mailing list