[gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options

Fri Jul 5 14:17:17 CEST 2013

> Date: Fri, 5 Jul 2013 12:38:15 +0200
> From: spoel at xray.bmc.uu.se
> To: gmx-users at gromacs.org
> Subject: Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with	various set-up options
> 
> On 2013-07-05 11:52, Cara Kreck wrote:
> > Sorry, the tables got all messed up. I've converted them to just text now:
> >
> > From: cara_a_k at hotmail.com
> > To: gmx-users at gromacs.org
> > Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options
> > Date: Fri, 5 Jul 2013 17:31:04 +0800
> >
> >
> >
> >
> >
> >
> >
> > Hi everyone,
> >
> > I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit solvent and comparing it to previous results from 3.3.3. Firstly I wanted to check if the reported reaction field issues of 4.5 was fixed (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and then I wanted to check which was the most efficient way to run. Since my simulation made it to 100ns without crashing, I'm hopeful that RF is no longer an issue. I then ran several shorter (4.5 ns) simulations with slightly different options but the same (equilibrated) starting point to compare run times. Not surprisingly for RF, it was much quicker to use just CPUs and forget about the GPU.
> >
> > However, when I did some basic analysis of my results, I found that there was some surprising differences between the runs. I then added in a couple of PME runs to verify that it wasn't RF specific. Temp and pressure were set to 303K and 1 bar, both with Berendsen.
> >
> >                                  Temperature        Potential E.       Pressure
> > System name    Details          Average    RMSD    Average    RMSD    Average    RMSD
> > 3.3.3 c_md     RF nst5 group    306.0       1.4    -439029    466     0.998      125
> > 4.6 c_md       RF nst5 group    303.9       1.4    -440455    461     0.0570     126
> > 4.6 c_vv       RF nst5 verlet   303.0       1.2    -438718    478     1.96       134
> > 4.6 g_md       RF nst20 verlet  303.0       1.4    -439359    3193    566        1139
> > 4.6 g_vv       RF nst20 verlet  303.0       1.2    -438635    3048    34.3       405
> > 4.6 c_pme      md nst5 group    303.0       1.4    -436138    461     0.135      125
> > 4.6 g_pme      md nst40 verlet  303.0       1.4    -431621    463     416        1016
> >
> > Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv integrator, etc. Verlet & group refer to cut-off scheme and nst# refers to nstlist frequency which was automatically changed by gromacs. I found very similar results (and run times) for the GPU runs when -nb was set to gpu or gpu_cpu. The only other difference between runs is that in 3.3.3 only the bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp warning, but I don't know how significant that is.
> >
> > It looks like the thermostat in 4.6 is more effective than in 3.3.3. According to the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K and 307.6K respectively, whereas the difference between the two is much smaller in the 4.6 runs (1.3K for c_md and <0.2K for the rest). I don't know if this could be in any way related to the other discrepancies.
> >
> > I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'm also surprised by how inaccurate the pressure calculations are and how large the RMSDs are for P.E. (RF only) and pressure (RF & PME) are when the GPU is used.
> >
> > I then looked at the energies of step 0 in the log files and found that several of the reported energy types varied, which I would have expected to be identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they are all continuations from the same starting point.
> >
> > System        LJ (SR)        Coulomb (SR)    Potential       Kinetic En.    Total Energy    Temperature    Pressure (bar)
> > 3.3.3 c_md    1.80072E+04    -4.30514E+05    -4.38922E+05    6.14932E+04    -3.77429E+05    3.06083E+02    1.53992E+02
> > 4.6 c_md      1.80072E+04    -4.30515E+05    -4.38922E+05    6.20484E+04    -3.76874E+05    3.08847E+02    1.56245E+02
> > 4.6 c_vv      1.15784E+04    -4.83639E+05    -4.37388E+05    6.14748E+04    -3.75913E+05    3.05992E+02    -1.40193E+03
> > 4.6 g_md      0.00000E+00     0.00000E+00     3.46728E+04    6.14991E+04    9.61719E+04    3.06113E+02    -1.70102E+04
> > 4.6 g_vv      0.00000E+00     0.00000E+00     3.46728E+04    6.14748E+04    9.61476E+04    3.05992E+02    -1.85758E+04
> > 4.6 c_pme     1.30512E+04    -3.37973E+05    -4.35821E+05    6.14989E+04    -3.74322E+05    3.06112E+02    4.50028E+02
> > 4.6 g_pme     1.76523E+04    -4.89006E+05    -4.31207E+05    6.14990E+04    -3.69708E+05    3.06112E+02    4.37951E+02
> >
> > Even 4.6 c_md has a different K.E. and therefore T.E, temp & pressure! How is that possible? There seems to be something weird going on when you combine RF with GPUs and/or the Verlet cut-off scheme, resulting in temporarily positive energies and/or negative pressures. I don't know if this matters in the end, but I thought it was odd that it only happens for RF. Recalculating the averages to ignore the weird step 0 values made negligible difference.
> >
> > So in summary:
> >
> > 1) GPUs still look a bit dodgy, particularly at pressure coupling, and
> > 2) There seems to be something fundamentally different between the way things are being calculated between 3.3.3 and 4.6 on CPUs as well. Would this be due to the Trotter scheme that Berk Hess mentioned here: http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619p4390624.html ? Will I have to stick with 3.3.3 for as long as I want to be able to compare to existing results?
> >
> > Thanks in advance,
> 
> 
> Long story, but thanks for checking.
> 
> You might want to see what nstcalcenergy = 1 does for your 4.6 results 
> in particular the fluctuations.
> 

I hadn't noticed this new mdp parameter and didn't realise it was possible to not calculate the energies on every step.  I'm guessing it was implicitly set to 1 in 3.3.3. The manual says it should be a multiple of nstlist for twin-range cut-offs, so should I set it to 5 instead? It'll still be a big change from 100.

> The difference in kinetic energy could come from a changed notation for 
> continuation runs in the mdp file.

No, this one I had already changed. It used to be unconstrained-start and now it's continuation. Unless there's something else I'm missing. Nothing jumps out at me from the mdout.mdp.

Thanks again,

Cara

> 
> >
> > Cara
> >
> >
> > Example .mdp file:
> >
> > integrator               = md
> > dt                       = 0.002 ; 2fs
> > nsteps                   = 2250000 ; 4.5ns
> > comm_grps                = DOPC SOL
> > nstxout                  = 1000
> > nstvout                  = 1000
> > nstlog                   = 1000
> > nstenergy                = 1000
> > energygrps               = DOPC SOL
> > cutoff-scheme            = group
> > nstlist                  = 5
> > ns_type                  = grid
> > pbc                      = xyz
> > rlist                    = 0.8
> > coulombtype              = Reaction-Field
> > rcoulomb                 = 1.4
> > epsilon_rf               = 62
> > vdwtype                  = Cut-off
> > rvdw                     = 1.4
> > tcoupl                   = berendsen
> > tc-grps                  = DOPC SOL
> > tau_t                    = 0.1 0.1
> > ref_t                    = 303 303
> > Pcoupl                   = berendsen
> > pcoupltype               = semiisotropic
> > tau_p                    = 1.0 1.0
> > compressibility          = 4.6e-5 4.6e-5
> > ref_p                    = 1.0 1.0
> > gen_vel                  = no
> > constraints              = all-bonds
> > constraint_algorithm     = lincs
> > continuation             = yes
> >
> >
> >
> >   		 	   		   		 	   		  --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> 
> 
> -- 
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
> spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists