[gmx-users] Looking for the source of the H-REMD slowdown when decoupling a lot of atoms

Sat Apr 2 12:19:01 CEST 2016

See: https://groups.google.com/forum/#!topic/plumed-users/eJ0xpnHPb_s and
https://github.com/GiovanniBussi/plumed2/tree/v2.2-hrex

Nicolas

2016-04-02 12:09 GMT+02:00 Mark Abraham <mark.j.abraham at gmail.com>:

> Hi,
>
> On Fri, Apr 1, 2016 at 11:39 PM Christopher Neale <
> chris.neale at alum.utoronto.ca> wrote:
>
> > Dear developers:
> >
> > Is it possible with only a small amount of work to modify calc_delta() in
> > src/programs/mdrun/repl_ex.cpp to add another case to re->type such that
> > the program sends the coordinates to the alternate .tpr information and
> > evaluates the energy completely?
>
>
> This is approximately what the implementation of plumed does.
>
> This would allow for arbitrary exchanges, including bizarre things like
> > exchanging the cutoff distance or whatever (not what I want to do -- I'm
> > just emphasizing the generality of this approach). It will certainly cost
> > some communication and compute cycles to do this, but for e.g. I
> currently
> > have a 12x slowdown using the free energy architecture while decoupling
> 5K
> > atoms. So if I do an exchange attempt every 500 steps, we're still
> breaking
> > even if this complete energy re-evaluation takes 5,500x longer than a
> > regular integration step -- and I don't see how it could possibly take
> > anywhere near that long assuming bMultiEx==FALSE.
> >
>
> One can also implement such schemes as a script that calls grompp and mdrun
> -rerun before relaunching whatever simulation. Horrible, but probably
> better.
>
> Possible problems that I can imagine are:
> > (a) the functions for a complete re-evaluation don't exist (either the
> > communication of coordinates or the energy evaluation)
> >
>
> The code's not modular enough for that, yet.
>
> (b) some issues with changes in temperature or pressure
> >
>
> Yes, such state is not well organised.
>
> (c) the coordinates that are required to do this evaluation no longer exist
> > because the energy evaluation functions are coupled to the timestep so we
> > really need to pass them the coordinates of the previous timestep.
> >
>
> No, those are separate stages, because there are multiple parts of the code
> computing forces.
>
> Any suggestions are really appreciated
> >
>
> Updating plumed for 5.1 is easily the path of least resistance, if it can
> do the job.
>
> Mark
>
> Thank you,
> > Chris.
> >
> > ________________________________________
> > From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Michael
> > Shirts <mrshirts at gmail.com>
> > Sent: 31 March 2016 14:00
> > To: Discussion list for GROMACS users
> > Subject: Re: [gmx-users] Looking for the source of the H-REMD slowdown
> > when decoupling a lot of atoms
> >
> > > One more question: why does the free energy code have to use its own
> > kernel? I realize that I'm going to sound like an idiot here, but why
> can't
> > one just tweak parameters outside of the kernel and then have the
> optimized
> > kernel do the dynamics? I presume that one has to step outside of the
> > kernel to do the replica exchanges, so why can't the code just use the
> > optimized kernel to do the dynamics between exchanges?
> >
> > The optimized energy code only codes very specific interactions;
> > vanilla Coulomb, and Lennard-Jones.  If it coded the more general
> > interactions (such as soft-core) it would be significantly slower.
> >
> > We've come up with some ways to handle free energy calculations with
> > optimized inner loops (I think we're talking about the same thing Mark
> > -- though we should coordinate to be sure), but that will take a
> > little time.
> >
> > One alternate possibility would be to create N replicas of the system,
> > with slightly different parameters, but where all systems use the
> > standard functional form, and then combine the results internall.  For
> > something like REST2, then this could work, since one is only linearly
> > scaling the interactions between two parts A and B of the system; so
> > it would be possible to represent each system as a separate physical
> > system. This could work fairly well for small numbers of replicas,
> > though might have some problems with large numbers of replicas. If
> > each replica was an entire new system, it could make replica exchange
> > quite slow, since calculating the energies of a given configuration
> > with a different replica's energy function would be very costly.
> >
> > Right now, for alchemical interactions, there are only two
> > representations of the system, and then lambda interpolates between
> > the two representations of the systems to create the alchemical
> > intermediates. Changing the way the code is structured would again
> > require some significant changes and time.
> >
> > On Thu, Mar 31, 2016 at 10:49 AM, Christopher Neale
> > <chris.neale at alum.utoronto.ca> wrote:
> > > Dear Szilárd, Mark, and Michael:
> > >
> > > Thank you for your suggestions.
> > >
> > > 1. I did use the icc compiler
> > > 2. I tested usage of nstlist=10 by setting verlet-buffer-tolerance=-1
> (I
> > had already set nstlist=10 but mdrun changes that to 25 in my previous
> > setup) -- this did not improve the performance
> > > 3. I tested increasing .mdp option nstdhdl from 200 to 20,000 both with
> > and without also increasing the mdrun -replex option from 200 to 20,000
> --
> > this did not improve the performance
> > > 4. I tried to test more than one openMP thread per rank. I am fuzzy on
> > this part of the usage, but I tried "ibrun -np 4 gmx_mpi mdrun -ntomp 6"
> > and it was a lot slower (I think it only used 4 cores?); I also tried as
> > above but with also -ntmpi 4 and it errored with a message about not
> being
> > able to set thread-MPI since I didn't compile with it. In any event, I
> > doubted that an optimization here would get back most of the 12x
> > performance loss so I didn't pursue this further.
> > >
> > > The good news is that there is a beta version of the H-REMD fork of
> > Plumed that works with gromacs 5.1, so I have a viable route for now.
> > >
> > > One more question: why does the free energy code have to use its own
> > kernel? I realize that I'm going to sound like an idiot here, but why
> can't
> > one just tweak parameters outside of the kernel and then have the
> optimized
> > kernel do the dynamics? I presume that one has to step outside of the
> > kernel to do the replica exchanges, so why can't the code just use the
> > optimized kernel to do the dynamics between exchanges?
> > >
> > > Thank you again for all of your help,
> > > Chris.
> > >
> > > ________________________________________
> > > From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> > gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Szilárd
> > Páll <pall.szilard at gmail.com>
> > > Sent: 30 March 2016 12:59
> > > To: Discussion list for GROMACS users
> > > Subject: Re: [gmx-users] Looking for the source of the H-REMD slowdown
> > when decoupling a lot of atoms
> > >
> > > Chris,
> > >
> > > I can suggest two possible tweaks, these won't do miracles, but may
> give
> > > you a bit better performance.
> > >
> > > * icc is better than gcc at optimizing the naiive free energy kernel
> > code.
> > > I observed in the past up to 1.5x faster free energy kernel performance
> > > with icc 15 vs gcc 4.9 or so.
> > >
> > > * The default nstlist heuristic/suggestion assumes fast non-bondeds. In
> > > your case, depending on how long is the list buffer with the nstlist
> you
> > > use (25?), you may be able to gain a bit of performance by shifting
> load
> > > back to the search with a smaller nstlist, e.g. 10.
> > >
> > > These two combined could give in best case 2x, I guess.
> > >
> > > What I find slightly unusual in the logs you posted on redmine is the
> > 3-4x
> > > slowdown in PME and constraints (I'd expect ~2x in PME and less in
> > > constraints), but it could be that this too is simply because most
> > > interaction in the system are perturbed which trigger non-optimized
> > > code-paths.
> > >
> > > Cheers,
> > > --
> > > Szilárd
> > >
> > > On Wed, Mar 30, 2016 at 3:05 PM, Mark Abraham <
> mark.j.abraham at gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> Yeah, unfortunately Michael's pretty much right - the free-energy
> > kernel is
> > >> currently that cousin nobody talks about (or to). It's essentially
> > >> unchanged since GROMACS 4.0 days, except that the Verlet scheme has
> some
> > >> kludge so that it can call the same kernel that the group scheme used
> to
> > >> call. But it has none of the optimizations and also some simplifying
> > >> pessimizations. The result is highly unsuitable for Chris's use case,
> > but
> > >> not horrible for the more normal case of perturbing a small part of a
> > >> system. For now, I can only suggest trying a run with more than one
> > OpenMP
> > >> thread per rank, but there's nothing in the log file snippets that
> > fills me
> > >> with any hope that it would be noticeably faster.
> > >>
> > >> We have a pile of infrastructure built and in the works that will lead
> > to
> > >> being able to offer similarly optimized free-energy kernels, but they
> > won't
> > >> see the light of day until next year, I'm afraid. A set of sample
> .tprs
> > at
> > >> Redmine 742 would be most welcome, however - it's very good for us to
> > know
> > >> when/that we're optimizing a workflow someone actually wants to run,
> but
> > >> currently has reason to avoid.
> > >>
> > >> Mark
> > >>
> > >> On Wed, Mar 30, 2016 at 8:05 AM Michael Shirts <mrshirts at gmail.com>
> > wrote:
> > >>
> > >> > Hi, Chris: I'm pretty sure that it's because the nonbonded free
> > >> > energies are much slower than the standard free energies.  You
> state:
> > >> >
> > >> > > I took a look at gmxlib/nonbonded/nb_free_energy.c in v.5.1.2,
> but I
> > >> was
> > >> > unable to find a function called "gmx_waste_time_here()" and beyond
> > that
> > >> I
> > >> > was out of my depth.
> > >> >
> > >> > But it's much more the fact that the non-free energy nonbondeds are
> > >> > SUPER optimized.
> > >> >
> > >> > I don't see a particularly viable way around this for now. The only
> > >> > thing I can think of splitting the neighborlists into two force
> calls,
> > >> > and scaling the forces and energies coming out of those.  That would
> > >> > be a huge pain.
> > >> >
> > >> > On Tue, Mar 29, 2016 at 9:50 PM, Christopher Neale
> > >> > <chris.neale at alum.utoronto.ca> wrote:
> > >> > > Dear Users:
> > >> > >
> > >> > > I am trying to do some Hamiltonian replica exchange (H-REMD) in
> > gromacs
> > >> > 5.1.2 and am running up against really large slowdowns when
> > decoupling a
> > >> > large number of atoms. I am decoupling 5360 atoms out of the 15520
> > atoms
> > >> in
> > >> > my system. The goal is not to get a PMF, but to enhance sampling
> using
> > >> the
> > >> > REST approach to partially decouple lipids in a bilayer. This
> approach
> > >> > enhances lipid relaxation times (
> > >> > http://pubs.acs.org/doi/pdf/10.1021/ct500305u ) though the authors
> of
> > >> > that paper modified the gromacs code to do their own H-REMD in order
> > to
> > >> > avoid the really slow speed they also got when decoupling lots of
> > atoms
> > >> via
> > >> > the free energy code.
> > >> > >
> > >> > > I have already posted details here
> > >> http://redmine.gromacs.org/issues/742
> > >> > , which includes .mdp options and some timing output. I compare the
> > >> timing
> > >> > output to a standard temperature REMD (T-REMD) run. For my usage,
> the
> > >> > slowdown is about 12x for H-REMD vs. T-REMD.
> > >> > >
> > >> > > I am motivated to find a solution within gromacs because the
> > >> alternative
> > >> > is to use gromacs 4.6.7 with plumed (or with the aforementioned
> > modified
> > >> > code, which is also gromacs v4). Normally that would be a viable
> > option,
> > >> > but I am using the charmm force field and the charmm TIP3P water
> and I
> > >> > would rather not give up the speed boost that I see in gromacs
> v5.1.2,
> > >> > which allows the use of the verlet cutoff scheme and has been tested
> > and
> > >> > shown to give the correct reproduction of charmm forces (vs. the
> > forces
> > >> one
> > >> > would get using the charmm simulation software).
> > >> > >
> > >> > > I took a look at gmxlib/nonbonded/nb_free_energy.c in v.5.1.2,
> but I
> > >> was
> > >> > unable to find a function called "gmx_waste_time_here()" and beyond
> > that
> > >> I
> > >> > was out of my depth.
> > >> > >
> > >> > > Thank you for any pointers,
> > >> > > Chris.
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Gromacs Users mailing list
> > >> > >
> > >> > > * Please search the archive at
> > >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > >> > posting!
> > >> > >
> > >> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >> > >
> > >> > > * For (un)subscribe requests visit
> > >> > >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > >> > send a mail to gmx-users-request at gromacs.org.
> > >> > --
> > >> > Gromacs Users mailing list
> > >> >
> > >> > * Please search the archive at
> > >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > >> > posting!
> > >> >
> > >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >> >
> > >> > * For (un)subscribe requests visit
> > >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > >> > send a mail to gmx-users-request at gromacs.org.
> > >> >
> > >> --
> > >> Gromacs Users mailing list
> > >>
> > >> * Please search the archive at
> > >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > >> posting!
> > >>
> > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>
> > >> * For (un)subscribe requests visit
> > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > >> send a mail to gmx-users-request at gromacs.org.
> > >>
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>