[gmx-users] results produced by auto-tuning of Coulomb cut-off/grid for PME can not be reproduced by manually setting the Coulomb cut-off and grid spacing

Thu Jan 22 20:36:45 CET 2015

On Thu, Jan 22, 2015 at 8:34 PM, Mark Abraham <mark.j.abraham at gmail.com>
wrote:

>
>
> On Thu, Jan 22, 2015 at 8:09 PM, Jiaqi Lin <jqlin at mit.edu> wrote:
>
>> Hi Mark
>>
>> Thank you for your patient reply.
>>
>> I've tried using the exact same set up (rcoulombic =3.258 fourierspacing
>> = 0.325, rlist =1.4, rlistlong =3.258, rvdw=1.2) as the auto-tune of would
>> have changed the parameters to, and still it won't reproduce the result
>> obtained by auto-tune, just the result I tired previously using similar
>> parameters. Therefore, I believe there is something going on in the code of
>> auto-tune that is not what it is appear to be, or, the auto-tune change the
>> parameters in a special way that in theory shouldn't affect simulation
>> result but in my case the system is very sensitive to these changes. As you
>> pointed out, the code is not straightforward, therefore much can happen at
>> this stage. I'd very much like to figure out what going on, but spending
>> two months on the code is not something I'm willing to do.
>
>
> One test you can do that is straightforward is to do an auto-tuning run
> and write the final output whenever the tuning completes.
>

Actually, get that frame and another one later. I don't remember whether
the e.g. step-1000 energy output is with the last-tried or
newly-decided-as-optimal PME regime.

> That trajectory should have only one frame in it. Then do a
> http://www.gromacs.org/Documentation/How-tos/Single-Point_Energy
> calculation on that trajectory file, using a .tpr that should match the
> auto-tuning result. If that gave different energy components, then I think
> that would be good evidence of a code bug. (Either way, as I've already
> said, you should just go and use the Verlet scheme for running this system
> on that amount of hardware...)
>
>
>> So I think I will keep changing parameters and try to understand long
>> range electrostatics in my system. Also, the simulation produced by
>> auto-tune is unlikely to be a outlier. I kept changing -ntpme and when
>> auto-tune give me rcoulombic larger than 3, I can always observe the
>> "interesting" result.
>>
>
> That is suggestive of a bug. If you can also reproduce it from runs that
> have different initial velocities, then that would be quite strong evidence.
>
> I also tired LJPME as you suggested for long range vdw, it compress the
>> bilayer to a large extend.
>
>
> What force field was that?
>
>
>> The whole force fields has to be refined if I use LJPME, but that's
>> another story. I think electrostatics is the main driving force in my
>> simulation system, and I'll keep trying to figure out what really happened.
>>
>
> Hmm, that sounds like your model physics might be over-parameterized on an
> invalid van der Waals approximation. I'd give serious consideration to
> reviewing its performance on other problems quite carefully. Using it only
> in a regime and/or software package where it is thought to be valid would
> be the standard advice.
>
> Anyway, there is no way that
> * if the model physics is correct and not over-tuned to the exact regime
> and software that produced it, and
> * if the GROMACS code is correct,
> that the simulation behaviour could depend in a statistically significant
> way on the tuning or -npme choice. Those values do not enter the model
> physics.
>
> Just changing electrostatics regimes arbitrarily and cherry-picking some
> of the results to analyse is going to get questions raised about your work
> - either the model physics is not robust enough to use on your system, or
> the software is wrong, or you're ignoring valid behaviour because something
> else looks more interesting.
>
> Mark
>
>
>> Best
>> Jiaqi
>>
>>
>> On 1/20/2015 7:18 PM, Mark Abraham wrote:
>>
>>> On Tue, Jan 20, 2015 at 11:48 PM, Jiaqi Lin <jqlin at mit.edu> wrote:
>>>
>>>  Hi Szilard,
>>>>
>>>> - I've tired 5.0.1 and it gives the same result. So 4.6.7 or 5.0.4 is
>>>> better, but in what way?
>>>>
>>>> - I've tired Verlet scheme and it gives small change of cutoff and grid.
>>>> But what I really interested is to manually reproduce the result that
>>>> tune_pme give me in the first case using Group scheme.
>>>>
>>>>  If that's the only situation you can observe it, it could be an
>>> outlier or
>>> your method could be unsuitable...
>>>
>>>
>>>  - I've also tired lincs-roder=4 and it makes no difference.
>>>>
>>>> - My Fourier spacing is 25% finer, but that shouldn't affect the results
>>>> right? if it do affect the results, then I want to find out how.
>>>>
>>>>  It affects your results because you could do some more sampling with
>>> the
>>> computer time you are spending on PME at the moment. Where to choose your
>>> balance of model quality vs amount of sampling is poorly understood, and
>>> problem-dependent.
>>>
>>> - I happen to use the PP/PME rank split (100+28) and it gives me
>>>
>>>> interesting results (speed of performance is not bad actually). Then I'm
>>>> very interested in how these cutoff and grid setting can affect my
>>>> simulation results.
>>>>
>>>
>>> If the implementation and model is right, then they have no effect.
>>> That's
>>> why we auto-tune with them. You're going to need a lot of replicates to
>>> show that -npme 28 gives a statistically different result from a
>>> different
>>> value, and you won't yet have established that it's somehow a more valid
>>> observation to pursue.
>>>
>>> So I tried to manually control the parameter (turning off tune_PME). But
>>> no
>>>
>>>> matter how I tired, I can not reproduce the result given by tune_pme.
>>>> So my
>>>> biggest question is, how does tune_PME implemented in the code? What
>>>> parameters does it actually tuned?
>>>>
>>>>  Like it says, it varies rcoulomb and the Fourier grid, keeping rvdw and
>>> beta fixed. Details of how rlist and rlistlong behave are a bit messy,
>>> but
>>> nothing you asked for is ever missed out.
>>>
>>>
>>>  - When PME tuned the cutoff to such large value, the speed does not goes
>>>> down noticeably. So what I suspect is that tune_PME
>>>>
>>>
>>> Please be careful with names. gmx tune_pme is a different thing from the
>>> mdrun auto-tuning.
>>>
>>>
>>>  does the direct space calculation without changing the neighbor list
>>>> search distance.
>>>>
>>>>  Your auto-tuned run number 1 had rlist = rcoulomb at the start, so
>>> mdrun
>>> knows you wanted a PME model with an unbuffered list whose size equals
>>> rcoulomb, and a buffered VDW model with rlist 1.4 and rvdw 1.2. Thus,
>>> rlistlong will stay equal to rcoulomb as it changes. The details and code
>>> are horrible, though, and I am looking forward to nothing so much as
>>> ripping it all out in about 2 months!
>>>
>>> And like Szilard suggested, your runs are probably a long way from
>>> maximum
>>> throughput. Aim for lots of sampling, don't chase replicating rare events
>>> with brute-force simulation!
>>>
>>> Mark
>>>
>>> Thank you
>>>
>>>> Best
>>>> Jiaqi
>>>>
>>>>
>>>> On 1/20/2015 3:54 PM, Szilárd Páll wrote:
>>>>
>>>>  Not (all) directly related, but a few comments/questions:
>>>>>
>>>>> - Have you tried 4.6.7 or 5.0.4?
>>>>> - Have you considered using the Verlet scheme instead of doing manual
>>>>> buffering?
>>>>> - lincs-order=8 is very large for 2fs production runs - typically 4 is
>>>>> used.
>>>>> - Your fourier spacing is a lot (~25%) finer than it needs to be.
>>>>>
>>>>> - The PP/PME rank split of 100+28 is _very_ inconvenient and it is the
>>>>> main cause of the horrible PME performance together with the overly
>>>>> coarse grid. That's why you get such a huge cut-off after the PP-PME
>>>>> load balancing. Even if you want to stick to these parameters, you
>>>>> should tune the rank split (manually or with tune_pme).
>>>>> - The above contributes to the high neighbor search cost too.
>>>>>
>>>>> --
>>>>> Szilárd
>>>>>
>>>>>
>>>>> On Tue, Jan 20, 2015 at 9:18 PM, Jiaqi Lin <jqlin at mit.edu> wrote:
>>>>>
>>>>>  Hi Mark,
>>>>>>
>>>>>> Thanks for reply. I put the md.log files in the following link
>>>>>>
>>>>>> https://www.dropbox.com/sh/d1d2fbwreizr974/
>>>>>> AABYhSRU03nmijbTIXKKr-rra?dl=0
>>>>>>
>>>>>> There are four log files
>>>>>>    1.GMX 4.6.5 -tunepme (the coulombic cutoff is tuned to 3.253)
>>>>>>    2.GMX 4.6.5 -notunepme  rcoulomb= 3.3 , fourierspace = 0.33
>>>>>>    3.GMX 4.6.5 -notunepme  rcoulomb= 3.3 , fourierspace = 0.14
>>>>>>    4.GMX 4.6.5 -notunepme  rcoulomb= 1.4 , fourierspace = 0.14
>>>>>>
>>>>>> Note that the LR Coulombic energy in the first one is almost twice the
>>>>>> value
>>>>>> of that in the second one, whereas the grid spacing in both cases are
>>>>>> nealy
>>>>>> the same.
>>>>>>
>>>>>> Only the first one gives a strong electrostatic interaction of a
>>>>>> nanoparticle with a lipid bilayer under ionic imbalance. In other
>>>>>> cases
>>>>>> I do
>>>>>> not observe such a strong interaction.
>>>>>>
>>>>>> GMX 5.0.1 give the same results as GMX 4.6.5 using Group cutoff.
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Jiaqi
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/19/2015 3:22 PM, Mark Abraham wrote:
>>>>>>
>>>>>>  On Thu, Jan 15, 2015 at 3:21 AM, Jiaqi Lin <jqlin at mit.edu> wrote:
>>>>>>>
>>>>>>>   Dear GMX developers,
>>>>>>>
>>>>>>>> I've encounter a problem in GROMACS concerning the auto-tuning
>>>>>>>> feature
>>>>>>>> of
>>>>>>>> PME that bugged me for months. As stated in the title, the
>>>>>>>> auto-tuning
>>>>>>>> feature of mdrun changed my coulomb cutoff from 1.4 nm to ~3.3 nm
>>>>>>>> (stated
>>>>>>>> in md.log) when I set -npme to be 28 (128 total CPU cores), and this
>>>>>>>> giving
>>>>>>>> me interesting simulation results. When I use -notunepme, I found
>>>>>>>> Coulomb
>>>>>>>> (SR) and recip. giving me same energy but the actual simulation
>>>>>>>> result
>>>>>>>> is
>>>>>>>> different. This i can understand: scaling between coulombic
>>>>>>>> cut-off/grid
>>>>>>>> size theoretically give same accuracy to electrostatics (according
>>>>>>>> to
>>>>>>>> GMX
>>>>>>>> manual and PME papers), but there actually some numerical error due
>>>>>>>> to
>>>>>>>> grid
>>>>>>>> mapping and even if the energy is the same that does not mean system
>>>>>>>> configuration has to be the same (NVE ensemble: constant energy,
>>>>>>>> different
>>>>>>>> configuration).
>>>>>>>>
>>>>>>>>   Total electrostatic energy should be approximately the same with
>>>>>>>>
>>>>>>> different
>>>>>>> PME partitions.
>>>>>>>
>>>>>>>
>>>>>>>   However the thing i don't understand is the following. I am
>>>>>>> interested
>>>>>>>
>>>>>>>> in
>>>>>>>> the result under large coulomb cut-off, so I try to manually set
>>>>>>>> cut-off
>>>>>>>> and grid space with -notunepme, using the value tuned by mdrun
>>>>>>>> previously.
>>>>>>>> This give me complete different simulation result, and the energy is
>>>>>>>> also
>>>>>>>> different. I've tried to set rlist, rlistlong, or both to equal
>>>>>>>> rcoulomb
>>>>>>>> (~3.3) still does not give me the result produced by auto-tuning
>>>>>>>> PME.
>>>>>>>>
>>>>>>>>  In what sense is the result different?
>>>>>>>
>>>>>>>
>>>>>>>   In addition, simulation speed dramatically reduces when I set
>>>>>>> rcoulomb
>>>>>>>
>>>>>>>> to
>>>>>>>> be ~3.3 (using -tunepme the speed remains nearly the same no matter
>>>>>>>> how
>>>>>>>> large the cutoff is tuned to). I've tested this in both GMX 4.6.5
>>>>>>>> and
>>>>>>>> 5.0.1, same thing happens, so clearly it's not because of versions.
>>>>>>>> Thus
>>>>>>>> the question is: what exactly happened to PME calcualtion using the
>>>>>>>> auto-tuning feature in mdrun, why it does give different results
>>>>>>>> when I
>>>>>>>> manually set the coulomb cutoff and grid space to the value tuned by
>>>>>>>> mdrun
>>>>>>>> without the auto-tuning feature (using -notunepme)? Thank you for
>>>>>>>> help.
>>>>>>>>
>>>>>>>>   For the group scheme, these should all lead to essentially the
>>>>>>>> same
>>>>>>>>
>>>>>>> result
>>>>>>> and (if tuned) performance. If you can share your various log files
>>>>>>> on a
>>>>>>> file-sharing service (rc 1.4, rc 3.3, various -tunepme settings,
>>>>>>> 4.6.5
>>>>>>> and
>>>>>>> 5.0.1) then we can be in a position to comment further.
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>>   additional info: I use Group cutoff-scheme , rvdw is  1.2.
>>>>>>>
>>>>>>>>
>>>>>>>>     md.log file:
>>>>>>>> DD  step 9 load imb.: force 29.4%  pme mesh/force 3.627
>>>>>>>>
>>>>>>>> step   30: timed with pme grid 280 280 384, coulomb cutoff 1.400:
>>>>>>>> 1026.4
>>>>>>>> M-cycles
>>>>>>>> step   50: timed with pme grid 256 256 324, coulomb cutoff 1.464:
>>>>>>>> 850.3
>>>>>>>> M-cycles
>>>>>>>> step   70: timed with pme grid 224 224 300, coulomb cutoff 1.626:
>>>>>>>> 603.6
>>>>>>>> M-cycles
>>>>>>>> step   90: timed with pme grid 200 200 280, coulomb cutoff 1.822:
>>>>>>>> 555.2
>>>>>>>> M-cycles
>>>>>>>> step  110: timed with pme grid 160 160 208, coulomb cutoff 2.280:
>>>>>>>> 397.0
>>>>>>>> M-cycles
>>>>>>>> step  130: timed with pme grid 144 144 192, coulomb cutoff 2.530:
>>>>>>>> 376.0
>>>>>>>> M-cycles
>>>>>>>> step  150: timed with pme grid 128 128 160, coulomb cutoff 2.964:
>>>>>>>> 343.7
>>>>>>>> M-cycles
>>>>>>>> step  170: timed with pme grid 112 112 144, coulomb cutoff 3.294:
>>>>>>>> 334.8
>>>>>>>> M-cycles
>>>>>>>> Grid: 12 x 14 x 14 cells
>>>>>>>> step  190: timed with pme grid 84 84 108, coulomb cutoff 4.392:
>>>>>>>> 346.2
>>>>>>>> M-cycles
>>>>>>>> step  190: the PME grid restriction limits the PME load balancing
>>>>>>>> to a
>>>>>>>> coulomb cut-off of 4.392
>>>>>>>> step  210: timed with pme grid 128 128 192, coulomb cutoff 2.846:
>>>>>>>> 360.6
>>>>>>>> M-cycles
>>>>>>>> step  230: timed with pme grid 128 128 160, coulomb cutoff 2.964:
>>>>>>>> 343.6
>>>>>>>> M-cycles
>>>>>>>> step  250: timed with pme grid 120 120 160, coulomb cutoff 3.036:
>>>>>>>> 340.4
>>>>>>>> M-cycles
>>>>>>>> step  270: timed with pme grid 112 112 160, coulomb cutoff 3.253:
>>>>>>>> 334.3
>>>>>>>> M-cycles
>>>>>>>> step  290: timed with pme grid 112 112 144, coulomb cutoff 3.294:
>>>>>>>> 334.7
>>>>>>>> M-cycles
>>>>>>>> step  310: timed with pme grid 84 84 108, coulomb cutoff 4.392:
>>>>>>>> 348.0
>>>>>>>> M-cycles
>>>>>>>>                  optimal pme grid 112 112 160, coulomb cutoff 3.253
>>>>>>>> DD  step 999 load imb.: force 18.4%  pme mesh/force 0.918
>>>>>>>>
>>>>>>>> At step 1000 the performance loss due to force load imbalance is
>>>>>>>> 6.3 %
>>>>>>>>
>>>>>>>> NOTE: Turning on dynamic load balancing
>>>>>>>>
>>>>>>>>               Step           Time         Lambda
>>>>>>>>               1000       20.00000        0.00000
>>>>>>>>
>>>>>>>>       Energies (kJ/mol)
>>>>>>>>               Bond       G96Angle        LJ (SR)   Coulomb (SR)
>>>>>>>>  Coul.
>>>>>>>> recip.
>>>>>>>>        1.98359e+05    1.79181e+06   -1.08927e+07   -7.04736e+06
>>>>>>>> -2.32682e+05
>>>>>>>>     Position Rest.      Potential    Kinetic En.   Total Energy
>>>>>>>> Temperature
>>>>>>>>        6.20627e+04   -1.61205e+07    4.34624e+06   -1.17743e+07
>>>>>>>> 3.00659e+02
>>>>>>>>     Pressure (bar)   Constr. rmsd
>>>>>>>>        2.13582e+00    1.74243e-04
>>>>>>>>
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Jiaqi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jiaqi Lin
>>>>>>>> postdoc fellow
>>>>>>>> The Langer Lab
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at http://www.gromacs.org/
>>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>>
>>>>>>>>   --
>>>>>>>>
>>>>>>> Jiaqi Lin
>>>>>> postdoc fellow
>>>>>> The Langer Lab
>>>>>>
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>> send a
>>>>>> mail to gmx-users-request at gromacs.org.
>>>>>>
>>>>>>  --
>>>> Jiaqi Lin
>>>> postdoc fellow
>>>> The Langer Lab
>>>> David H. Koch Institute for Integrative Cancer Research
>>>> Massachusetts Institute of Technology
>>>>
>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at http://www.gromacs.org/
>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>>>>
>>>>
>> --
>> Jiaqi Lin
>> postdoc fellow
>> The Langer Lab
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/
>> Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>
>