[gmx-users] FEP and loss of performance

Thu Apr 7 15:55:51 CEST 2011

Ok. I agree with you, FEP performance is an important issue to resolve but I 
know that there are also other priorities. However, I would thank you for your 
interest and and your suggestions.

Luca

> I would suggest that you take Chris' advice and post all of this as a
> feature request on redmine.gromacs.org so that it can be put on a to-do
> list.  Enhancing the performance of the free energy code is probably going
> to be a low-priority, long-term goal (in the absence of any proven bug),
> but at least it won't get lost in the shuffle of the mailing list.  If
> there's no record of it in redmine, it likely won't get addressed. 
> Gromacs is undergoing major changes at the moment, so the core developers
> are quite busy with other priorities.
> 
> -Justin
> 
> Luca Bellucci wrote:
> > I posted my test files in:
> > https://www.dropbox.com/link/17.-sUcJyMeEL?k=0f3b6fa098389405e7e15c886dcc
> > 83c1 This is a run for a dialanine peptide in a water box.
> > The cell side cubic box was 40 A.
> > The directory is organized as :
> > TEST\
> > 
> >         topol.top
> > 	
> > 	Run-00/confout.gro    ; Equilibrated structure
> > 	Run-00/state.cp
> > 	
> > 	MD-std/Commands ; commands to run the simulation , grompp and mdrun
> > 	
> >         MD-std/md.mdp
> >         
> >         MD-FEP/Commands
> >         MD-FEP/md.mdp
> > 
> > ~700 kb
> > 
> >> David Mobley wrote:
> >>> Hi,
> >>> 
> >>> This doesn't sound like normal behavior. In fact, this is not what I
> >>> typically observe. While there may be a small performance difference,
> >>> it is probably at the level of a few percent. Certainly not a factor
> >>> of more than 10.
> >> 
> >> I see about a 50% reduction in speed when decoupling small molecules in
> >> water. For me, I don't care if a nanosecond takes 2 or 3 hours.  For
> >> larger systems such as the ones considered here, it seems that the
> >> performance loss is much more dramatic.
> >> 
> >> I can reproduce the poor performance with a simple water box with the
> >> free energy code on.  Decoupling the whole system (or at least, a large
> >> part of it, as was the original intent of this thread, as I understand
> >> it) results in a 1500% slowdown.  Some observations:
> >> 
> >> 1. Water optimizations are turned off when decoupling the water, but
> >> this only accounts for 20% of the slowdown, which is relatively
> >> insignificant.
> >> 
> >> 2. Using lambda=0.9 (from a previous post) in my water box results in
> >> even worse performance, but much of this is due to DD instability.  The
> >> system I used has a few hundred water molecules in it, and after about
> >> 10-12 ps, they collapse in on one another and form clusters,
> >> dramatically shifting the balance of atoms between DD cells.  DLB gets
> >> activated but the force imbalances are around 40%, and the total
> >> slowdown (relative to
> >> non-perturbed trajectories) is 2000%.
> >> 
> >> 3. Using lambda=0 results in stable trajectories with very low
> >> imbalance, but also poor performance.  It seems that mdrun spends all
> >> of its time in
> >> 
> >> the free energy innerloops:
> >>   Computing:                               M-Number         M-Flops  %
> >> 
> >> Flops
> >> ------------------------------------------------------------------------
> >> -- --- Free energy innerloop                19064.187513     2859628.127
> >> 89.1 Outer nonbonded loop                   325.153806        3251.538
> >> 0.1 Calc Weights                           231.754635        8343.167
> >> 0.3 Spread Q Bspline                      9888.197760       19776.396
> >> 0.6 Gather F Bspline                      9888.197760       59329.187
> >> 1.8 3D-FFT                               24406.688124      195253.505
> >> 6.1 Solve PME                              485.109702       31047.021
> >> 1.0 NS-Pairs                               521.616615       10953.949
> >> 0.3 Reset In Box                             2.575515           7.727
> >> 0.0 CG-CoM                                   7.728090          23.184
> >> 0.0 Virial                                   8.176635         147.179
> >> 0.0 Update                                  77.251545        2394.798
> >> 0.1 Stop-CM                                  0.774045           7.740
> >> 0.0 Calc-Ekin                               77.253090        2085.833
> >> 0.1 Constraint-V                            77.253090         618.025
> >> 0.0 Constraint-Vir                           7.726545         185.437
> >> 0.0 Settle                                  51.502060       16635.165
> >> 0.5
> >> ------------------------------------------------------------------------
> >> -- --- Total                                                 3209687.978
> >> 100.0
> >> ------------------------------------------------------------------------
> >> -- ---
> >> 
> >>> You may want to provide an mdp file and topology, etc. so someone can
> >>> see if they can reproduce your problem.
> >> 
> >> I agree that would be useful.  I can contribute my water box system if
> >> it would help, as well.
> >> 
> >> -Justin
> >> 
> >>> Thanks.
> >>> 
> >>> On Wed, Apr 6, 2011 at 7:59 AM, Luca Bellucci <lcbllcc at gmail.com> wrote:
> >>>> I followed your suggestions and i tried to perform a MD run wit
> >>>> GROMACS and NAMD for dialanine peptide in a water box. The cell side
> >>>> cubic box was 40 A.
> >>>> 
> >>>> GROMACS:
> >>>> With the free energy module there is a drop in gromacs performance of
> >>>> about 10/20 fold.
> >>>> Standard MD:      Time:          6.693       6.693    100.0
> >>>> Free energy MD:   Time:    136.113    136.113    100.0
> >>>> 
> >>>> NAMD:
> >>>> With free energy module there is not a  drop in performance so evident
> >>>> as in gromacs.
> >>>> Standard MD   6.900000
> >>>> Free energy MD 9.600000
> >>>> 
> >>>> I would like to point out that this kind of calculation is common, in
> >>>> fact in the manual of gromacs 4.5.3 it is reported  " There is a
> >>>> special option system that couples all molecules types in the system.
> >>>> This can be useful for equilibrating a system [..] ".
> >>>> 
> >>>> Actually, I would understand if there is a solution to resolve the
> >>>> drop in gromacs performance for this kind of calculation.
> >>>> 
> >>>> Luca
> >>>> 
> >>>>> I don't know if it is possible or not. I think that you can enhance
> >>>>> your chances of developer attention if you develop a small and simple
> >>>>> test system that reproduces the slowdown and very explicitly state
> >>>>> your case for why you can't use some other method. I would suggest
> >>>>> posting that to the mailing list and, if you don't get any response,
> >>>>> post it as an enhancement request on the redmine page (or whatever
> >>>>> has taken over from bugzilla).
> >>>>> 
> >>>>> Good luck,
> >>>>> Chris.
> >>>>> 
> >>>>> -- original message --
> >>>>> 
> >>>>> 
> >>>>> Yes i am testing the possibility to perform an Hamiltonian-REMD
> >>>>> Energy barriers can be overcome  increasing the temperature system or
> >>>>> scaling potential energy  with a lambda value, these methods are
> >>>>> "equivalent". Both have advantages and disavantages, at this stage it
> >>>>> is not the right place to debate on it. The main problem seems to be
> >>>>> how to overcome to the the loss of gromacs performance in such
> >>>>> calculation.  At this moment it seems an intrinsic code problem.
> >>>>> Is it possible?
> >>>>> 
> >>>>>>  >> Dear Chris and Justin
> >>>>>>>> 
> >>>>>>>> /  Thank you for your precious suggestions
> >>>>>> 
> >>>>>> />>/  This is a test that i perform in a single machine with 8 cores
> >>>>>> />>/  and gromacs 4.5.4.
> >>>>>> />>/
> >>>>>> />>/  I am trying  to enhance the  sampling of a protein using the
> >>>>>> decoupling scheme />>/  of the free energy module of gromacs. 
> >>>>>> However when i decouple only the />>/  protein, the protein
> >>>>>> collapsed. Because i simulated in NVT i thought that />>/  this was
> >>>>>> an effect of the solvent. I was trying to decouple also the solvent
> >>>>>> />>/  to understand the system behavior.
> >>>>>> />>/
> >>>>>> />
> >>>>>> 
> >>>>>>> Rather than suspect that the solvent is the problem, it's more
> >>>>>>> likely that decoupling an entire protein simply isn't stable.  I
> >>>>>>> have never tried
> >>>>>>> 
> >>>>>>> anything that enormous, but the volume change in the system could
> >>>>>>> be unstable, along with any number of factors, depending on how
> >>>>>>> you approach it.
> >>>>>>> 
> >>>>>>> If you're looking for better sampling, REMD is a much more robust
> >>>>>>> approach
> >>>>>>> 
> >>>>>>> than trying to manipulate the interactions of huge parts of your
> >>>>>>> system using the free energy code.
> >>>>>> 
> >>>>>> Presumably Luca is interested in some type of hamiltonian exchange
> >>>>>> where lambda represents the interactions between the protein and the
> >>>>>> solvent? This can actually be a useful method for enhancing
> >>>>>> sampling. I think it's dangerous if we rely to heavily on "try
> >>>>>> something else". I still see no methodological reason a priori why
> >>>>>> there should be any actual slowdown, so that makes me think that
> >>>>>> it's an implementation thing, and there is at least the possibility
> >>>>>> that this is something that could be fixed as an enhancement.
> >>>>>> 
> >>>>>> Chris.
> >>>>>> 
> >>>>>> 
> >>>>>> -Justin
> >>>>>> 
> >>>>>>> /   I expected a loss of performance, but not so drastic.
> >>>>>> 
> >>>>>> />/  Luca
> >>>>>> />/
> >>>>>> />>/  Load balancing problems I can understand, but why would it
> >>>>>> take longer />>/  in absolute time? I would have thought that some
> >>>>>> nodes would simple be />>/  sitting idle, but this should not cause
> >>>>>> an increase in the overall />>/  simulation time (15x at that!).
> >>>>>> />>/
> >>>>>> />>/  There must be some extra communication?
> >>>>>> />>/
> >>>>>> />>/  I agree with Justin that this seems like a strange thing to
> >>>>>> do, but />>/  still I think that there must be some underlying
> >>>>>> coding issue (probably />>/  one that only exists because of a
> >>>>>> reasonable assumption that nobody />>/  would annihilate the
> >>>>>> largest part of their system). />>/
> >>>>>> />>/  Chris.
> >>>>>> />>/
> >>>>>> />>/  Luca Bellucci wrote:
> >>>>>> />>>/  /  Hi Chris,
> >>>>>> />>/  />/  thank for the suggestions,
> >>>>>> />>/  />/  in the previous mail there is a mistake because
> >>>>>> />>/  />/  couple-moltype = SOL (for solvent) and not
> >>>>>> "Protein_chaim_P". />>/  />/  Now the problem of the load balance
> >>>>>> seems reasonable, because />>/  />/  the water box is large ~9.0 nm.
> >>>>>> />>/  /
> >>>>>> />>/  Now your outcome makes a lot more sense.  You're decoupling
> >>>>>> all of the />>/  solvent? I don't see how that is going to be
> >>>>>> physically stable or terribly /
> >>>> 
> >>>> --
> >>>> gmx-users mailing list    gmx-users at gromacs.org
> >>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >>>> Please search the archive at
> >>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >>>> Please don't post (un)subscribe requests to the list. Use the
> >>>> www interface or send it to gmx-users-request at gromacs.org.
> >>>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists