[gmx-users] Looking for the source of the H-REMD slowdown when decoupling a lot of atoms

Sat Apr 2 12:09:27 CEST 2016

Hi,

On Fri, Apr 1, 2016 at 11:39 PM Christopher Neale <
chris.neale at alum.utoronto.ca> wrote:

> Dear developers:
>
> Is it possible with only a small amount of work to modify calc_delta() in
> src/programs/mdrun/repl_ex.cpp to add another case to re->type such that
> the program sends the coordinates to the alternate .tpr information and
> evaluates the energy completely?

This is approximately what the implementation of plumed does.

This would allow for arbitrary exchanges, including bizarre things like
> exchanging the cutoff distance or whatever (not what I want to do -- I'm
> just emphasizing the generality of this approach). It will certainly cost
> some communication and compute cycles to do this, but for e.g. I currently
> have a 12x slowdown using the free energy architecture while decoupling 5K
> atoms. So if I do an exchange attempt every 500 steps, we're still breaking
> even if this complete energy re-evaluation takes 5,500x longer than a
> regular integration step -- and I don't see how it could possibly take
> anywhere near that long assuming bMultiEx==FALSE.
>

One can also implement such schemes as a script that calls grompp and mdrun
-rerun before relaunching whatever simulation. Horrible, but probably
better.

Possible problems that I can imagine are:
> (a) the functions for a complete re-evaluation don't exist (either the
> communication of coordinates or the energy evaluation)
>

The code's not modular enough for that, yet.

(b) some issues with changes in temperature or pressure
>

Yes, such state is not well organised.

(c) the coordinates that are required to do this evaluation no longer exist
> because the energy evaluation functions are coupled to the timestep so we
> really need to pass them the coordinates of the previous timestep.
>

No, those are separate stages, because there are multiple parts of the code
computing forces.

Any suggestions are really appreciated
>

Updating plumed for 5.1 is easily the path of least resistance, if it can
do the job.

Mark

Thank you,
> Chris.
>
> ________________________________________
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Michael
> Shirts <mrshirts at gmail.com>
> Sent: 31 March 2016 14:00
> To: Discussion list for GROMACS users
> Subject: Re: [gmx-users] Looking for the source of the H-REMD slowdown
> when decoupling a lot of atoms
>
> > One more question: why does the free energy code have to use its own
> kernel? I realize that I'm going to sound like an idiot here, but why can't
> one just tweak parameters outside of the kernel and then have the optimized
> kernel do the dynamics? I presume that one has to step outside of the
> kernel to do the replica exchanges, so why can't the code just use the
> optimized kernel to do the dynamics between exchanges?
>
> The optimized energy code only codes very specific interactions;
> vanilla Coulomb, and Lennard-Jones.  If it coded the more general
> interactions (such as soft-core) it would be significantly slower.
>
> We've come up with some ways to handle free energy calculations with
> optimized inner loops (I think we're talking about the same thing Mark
> -- though we should coordinate to be sure), but that will take a
> little time.
>
> One alternate possibility would be to create N replicas of the system,
> with slightly different parameters, but where all systems use the
> standard functional form, and then combine the results internall.  For
> something like REST2, then this could work, since one is only linearly
> scaling the interactions between two parts A and B of the system; so
> it would be possible to represent each system as a separate physical
> system. This could work fairly well for small numbers of replicas,
> though might have some problems with large numbers of replicas. If
> each replica was an entire new system, it could make replica exchange
> quite slow, since calculating the energies of a given configuration
> with a different replica's energy function would be very costly.
>
> Right now, for alchemical interactions, there are only two
> representations of the system, and then lambda interpolates between
> the two representations of the systems to create the alchemical
> intermediates. Changing the way the code is structured would again
> require some significant changes and time.
>
> On Thu, Mar 31, 2016 at 10:49 AM, Christopher Neale
> <chris.neale at alum.utoronto.ca> wrote:
> > Dear Szilárd, Mark, and Michael:
> >
> > Thank you for your suggestions.
> >
> > 1. I did use the icc compiler
> > 2. I tested usage of nstlist=10 by setting verlet-buffer-tolerance=-1 (I
> had already set nstlist=10 but mdrun changes that to 25 in my previous
> setup) -- this did not improve the performance
> > 3. I tested increasing .mdp option nstdhdl from 200 to 20,000 both with
> and without also increasing the mdrun -replex option from 200 to 20,000 --
> this did not improve the performance
> > 4. I tried to test more than one openMP thread per rank. I am fuzzy on
> this part of the usage, but I tried "ibrun -np 4 gmx_mpi mdrun -ntomp 6"
> and it was a lot slower (I think it only used 4 cores?); I also tried as
> above but with also -ntmpi 4 and it errored with a message about not being
> able to set thread-MPI since I didn't compile with it. In any event, I
> doubted that an optimization here would get back most of the 12x
> performance loss so I didn't pursue this further.
> >
> > The good news is that there is a beta version of the H-REMD fork of
> Plumed that works with gromacs 5.1, so I have a viable route for now.
> >
> > One more question: why does the free energy code have to use its own
> kernel? I realize that I'm going to sound like an idiot here, but why can't
> one just tweak parameters outside of the kernel and then have the optimized
> kernel do the dynamics? I presume that one has to step outside of the
> kernel to do the replica exchanges, so why can't the code just use the
> optimized kernel to do the dynamics between exchanges?
> >
> > Thank you again for all of your help,
> > Chris.
> >
> > ________________________________________
> > From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Szilárd
> Páll <pall.szilard at gmail.com>
> > Sent: 30 March 2016 12:59
> > To: Discussion list for GROMACS users
> > Subject: Re: [gmx-users] Looking for the source of the H-REMD slowdown
> when decoupling a lot of atoms
> >
> > Chris,
> >
> > I can suggest two possible tweaks, these won't do miracles, but may give
> > you a bit better performance.
> >
> > * icc is better than gcc at optimizing the naiive free energy kernel
> code.
> > I observed in the past up to 1.5x faster free energy kernel performance
> > with icc 15 vs gcc 4.9 or so.
> >
> > * The default nstlist heuristic/suggestion assumes fast non-bondeds. In
> > your case, depending on how long is the list buffer with the nstlist you
> > use (25?), you may be able to gain a bit of performance by shifting load
> > back to the search with a smaller nstlist, e.g. 10.
> >
> > These two combined could give in best case 2x, I guess.
> >
> > What I find slightly unusual in the logs you posted on redmine is the
> 3-4x
> > slowdown in PME and constraints (I'd expect ~2x in PME and less in
> > constraints), but it could be that this too is simply because most
> > interaction in the system are perturbed which trigger non-optimized
> > code-paths.
> >
> > Cheers,
> > --
> > Szilárd
> >
> > On Wed, Mar 30, 2016 at 3:05 PM, Mark Abraham <mark.j.abraham at gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Yeah, unfortunately Michael's pretty much right - the free-energy
> kernel is
> >> currently that cousin nobody talks about (or to). It's essentially
> >> unchanged since GROMACS 4.0 days, except that the Verlet scheme has some
> >> kludge so that it can call the same kernel that the group scheme used to
> >> call. But it has none of the optimizations and also some simplifying
> >> pessimizations. The result is highly unsuitable for Chris's use case,
> but
> >> not horrible for the more normal case of perturbing a small part of a
> >> system. For now, I can only suggest trying a run with more than one
> OpenMP
> >> thread per rank, but there's nothing in the log file snippets that
> fills me
> >> with any hope that it would be noticeably faster.
> >>
> >> We have a pile of infrastructure built and in the works that will lead
> to
> >> being able to offer similarly optimized free-energy kernels, but they
> won't
> >> see the light of day until next year, I'm afraid. A set of sample .tprs
> at
> >> Redmine 742 would be most welcome, however - it's very good for us to
> know
> >> when/that we're optimizing a workflow someone actually wants to run, but
> >> currently has reason to avoid.
> >>
> >> Mark
> >>
> >> On Wed, Mar 30, 2016 at 8:05 AM Michael Shirts <mrshirts at gmail.com>
> wrote:
> >>
> >> > Hi, Chris: I'm pretty sure that it's because the nonbonded free
> >> > energies are much slower than the standard free energies.  You state:
> >> >
> >> > > I took a look at gmxlib/nonbonded/nb_free_energy.c in v.5.1.2, but I
> >> was
> >> > unable to find a function called "gmx_waste_time_here()" and beyond
> that
> >> I
> >> > was out of my depth.
> >> >
> >> > But it's much more the fact that the non-free energy nonbondeds are
> >> > SUPER optimized.
> >> >
> >> > I don't see a particularly viable way around this for now. The only
> >> > thing I can think of splitting the neighborlists into two force calls,
> >> > and scaling the forces and energies coming out of those.  That would
> >> > be a huge pain.
> >> >
> >> > On Tue, Mar 29, 2016 at 9:50 PM, Christopher Neale
> >> > <chris.neale at alum.utoronto.ca> wrote:
> >> > > Dear Users:
> >> > >
> >> > > I am trying to do some Hamiltonian replica exchange (H-REMD) in
> gromacs
> >> > 5.1.2 and am running up against really large slowdowns when
> decoupling a
> >> > large number of atoms. I am decoupling 5360 atoms out of the 15520
> atoms
> >> in
> >> > my system. The goal is not to get a PMF, but to enhance sampling using
> >> the
> >> > REST approach to partially decouple lipids in a bilayer. This approach
> >> > enhances lipid relaxation times (
> >> > http://pubs.acs.org/doi/pdf/10.1021/ct500305u ) though the authors of
> >> > that paper modified the gromacs code to do their own H-REMD in order
> to
> >> > avoid the really slow speed they also got when decoupling lots of
> atoms
> >> via
> >> > the free energy code.
> >> > >
> >> > > I have already posted details here
> >> http://redmine.gromacs.org/issues/742
> >> > , which includes .mdp options and some timing output. I compare the
> >> timing
> >> > output to a standard temperature REMD (T-REMD) run. For my usage, the
> >> > slowdown is about 12x for H-REMD vs. T-REMD.
> >> > >
> >> > > I am motivated to find a solution within gromacs because the
> >> alternative
> >> > is to use gromacs 4.6.7 with plumed (or with the aforementioned
> modified
> >> > code, which is also gromacs v4). Normally that would be a viable
> option,
> >> > but I am using the charmm force field and the charmm TIP3P water and I
> >> > would rather not give up the speed boost that I see in gromacs v5.1.2,
> >> > which allows the use of the verlet cutoff scheme and has been tested
> and
> >> > shown to give the correct reproduction of charmm forces (vs. the
> forces
> >> one
> >> > would get using the charmm simulation software).
> >> > >
> >> > > I took a look at gmxlib/nonbonded/nb_free_energy.c in v.5.1.2, but I
> >> was
> >> > unable to find a function called "gmx_waste_time_here()" and beyond
> that
> >> I
> >> > was out of my depth.
> >> > >
> >> > > Thank you for any pointers,
> >> > > Chris.
> >> > >
> >> > >
> >> > > --
> >> > > Gromacs Users mailing list
> >> > >
> >> > > * Please search the archive at
> >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> > posting!
> >> > >
> >> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > >
> >> > > * For (un)subscribe requests visit
> >> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >> > send a mail to gmx-users-request at gromacs.org.
> >> > --
> >> > Gromacs Users mailing list
> >> >
> >> > * Please search the archive at
> >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> > posting!
> >> >
> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >
> >> > * For (un)subscribe requests visit
> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> > send a mail to gmx-users-request at gromacs.org.
> >> >
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
> >>
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>