[gmx-users] Energy/temperature drifts in Gromacs 4.0 / inconsistencies with Gromacs 3.3.1
David van der Spoel
spoel at xray.bmc.uu.se
Fri Mar 13 17:51:38 CET 2009
Pietro Amodeo wrote:
> Hi Gromacs users/developers,
>
> we have two Gromacs installations on two different clusters with the
> following sw versions:
>
> 1) Cluster: OLD(Myrinet)
> Gromacs 3.3.1
> (CentOS 4 / Rocks 4.1)
> kernel 2.6.9-22.ELsmp
> gcc 3.4.4
> fftw 3.1.2
> mpich-gm 1.2.7p1..18
>
> 2) Cluster: NEW(Infiniband)
> Gromacs 4.0.4 / 4.0.3
> (CentOS 5)
> kernel 2.6.18-53.el5
> gcc 4.1.2 20070626 (Red Hat 4.1.2-14) / icc 10.1 (Build 20070913
> Pack.ID: l_cc_p_10.1.008)
> fftw 3.2.1
> ofed131 - openmpi 1.2.6
>
> Both serial and parallel, both single- and double-precision versions of
> Gromacs 4.0.3 and 4.0.4 were compiled with gcc (deprecated 4.1.2, but
> tests were either passed or failed with minor discrepancies) and with
> Intel 10.1 compilers).
>
> We tried to reproduce on cluster NEW simple MD equilibrations on two
> different systems (proteins solvated in SPC water + counterions)
> successfully run on cluster OLD. We used as starting tpr files either the
> same ones used and produced in 3.3.1, or new 4.0.4 files.
> Although the starting energies for both systems were substantially equal:
> ------------------------------------------------------------------------------------------------------
> Cluster NEW system 2:
> Step Time Lambda
> 0 0.00000 0.00000
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 7.90343e+02 7.80369e+02 2.11086e+02 4.58020e+02 1.92904e+04
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 1.09286e+05 -1.43221e+03 -4.54033e+05 -5.15749e+04 2.74030e-01
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> -3.76224e+05 7.83268e+04 -2.97897e+05 3.12792e+02 9.77797e+03
> Cons. rmsd ()
> 2.19464e-05
> ------------------------------------------------------------------------------------------------------
> Cluster OLD system 2:
> Rel. Constraint Deviation: Max between atoms RMS
> Before LINCS 0.098014 1670 1671 0.006831
> After LINCS 0.000104 509 511 0.000022
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 7.90348e+02 7.80369e+02 2.11085e+02 4.58017e+02 1.92904e+04
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 1.09286e+05 -1.43221e+03 -4.54033e+05 -5.15750e+04 2.74027e-01
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> -3.76224e+05 7.83267e+04 -2.97897e+05 3.12791e+02 9.78296e+03
> ------------------------------------------------------------------------------------------------------
> in both cases the simulations with Gromacs 3.3.1 ran without any problem
> (and provided good starting points for very stable production runs), while
> those performed with Gromacs 4.0.3 or 4.0.4 after 2 ps or less
> systematically started exhibiting total energy and temperature wide
> oscillations with a net increasing drift in energy on both systems, and
> very rapidly increasing temperature variations in system 1, that led to
> premature run terminations with errors on LINCS or routines to calculate
> 1-4 interactions for all runs on system 1. System 2 exhibited a smaller
> energy drift and rather steady, but still significant, temperature
> oscillations, so the 100 ps run (8 cores, double-precision parallel
> version complied with Intel compiler, starting from original 3.3.1 tpr
> file) ended (apparently) regularly.
> However, avg. energy was higher than in corresponding 3.3.1 simulation and
> avg. temperature failed to reach the targeted 300K value. In particular
> protein suffered from poor thermal relaxation under the same conditions
> that in 3.3.1 simulations worked flawlessly.
> The final, average and r.m.s. values from log files of the two
> corresponding runs on system 2 with 3.3.1 and 4.0.4 are:
>
> ----------------------------------------------------------------------------
> Cluster NEW system 2:
> Step Time Lambda
> 50000 100.00000 0.00000
>
> Writing checkpoint, step 50000 at Thu Mar 12 16:21:03 2009
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 5.20200e+03 1.40303e+03 1.56905e+03 5.80966e+02 1.89786e+04
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 7.93006e+04 -1.43727e+03 -4.96019e+05 -6.18250e+04 2.12927e+03
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> -4.50118e+05 8.53539e+04 -3.64764e+05 3.40853e+02 7.37194e+03
> Cons. rmsd ()
> 6.33529e-05
>
> <====== ############### ==>
> <==== A V E R A G E S ====>
> <== ############### ======>
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 5.22068e+03 1.38036e+03 1.53871e+03 1.11820e+03 1.90723e+04
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 7.38412e+04 -1.40961e+03 -4.82837e+05 -6.16567e+04 4.79992e+03
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> -4.38932e+05 8.23804e+04 -3.56552e+05 3.28979e+02 2.00692e+02
> Cons. rmsd ()
> 0.00000e+00
>
> Box-X Box-Y Box-Z Volume Density (SI)
> 7.45269e+00 7.02652e+00 6.08533e+00 3.18779e+02 9.90264e+02
> pV
> -6.99836e+03
>
> Total Virial (kJ/mol)
> 3.09153e+04 3.94735e+01 -3.38472e+00
> 3.94745e+01 3.11413e+04 2.26993e+02
> -3.38879e+00 2.26991e+02 3.08214e+04
>
> Pressure (bar)
> 1.19126e+02 3.73568e+00 8.31997e+00
> 3.73558e+00 4.54768e+02 -1.26308e+01
> 8.32041e+00 -1.26307e+01 2.81831e+01
>
> Total Dipole (Debye)
> 1.44566e+02 4.10437e+02 1.05756e+02
>
> Epot (kJ/mol) Coul-SR LJ-SR LJ-LR
> Coul-14 LJ-14
> Protein-Protein -6.23914e+03 -5.96547e+03 -1.93349e+02
> 1.90723e+04 1.11820e+03
> Protein-Non-Protein -5.33140e+03 -1.38636e+03 -1.87522e+02
> 0.00000e+00 0.00000e+00
> Non-Protein-Non-Protein -4.71266e+05 8.11930e+04 -1.02874e+03
> 0.00000e+00 0.00000e+00
>
> T-Protein T-SOL T-CL-
> 5.93312e+02 3.12714e+02 3.24327e+02
>
> <====== ############################### ==>
> <==== R M S - F L U C T U A T I O N S ====>
> <== ############################### ======>
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 9.66238e+02 1.07074e+02 1.99039e+02 8.27177e+02 7.63636e+02
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 2.05222e+04 4.49488e+01 2.39437e+04 2.70027e+02 3.11921e+03
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> 7.21911e+03 4.02779e+03 6.92839e+03 1.60846e+01 1.78298e+04
> Cons. rmsd ()
> 0.00000e+00
>
> Box-X Box-Y Box-Z Volume Density (SI)
> 8.06489e-02 7.60371e-02 6.58520e-02 1.03515e+01 3.21181e+01
> pV
> 3.41845e+05
>
> Total Virial (kJ/mol)
> 1.45509e+05 1.37533e+03 1.65462e+03
> 1.37534e+03 2.49318e+05 1.79716e+03
> 1.65462e+03 1.79717e+03 1.14997e+05
>
> Pressure (bar)
> 1.52957e+04 1.45547e+02 1.74844e+02
> 1.45548e+02 2.61231e+04 1.90420e+02
> 1.74843e+02 1.90421e+02 1.21129e+04
>
> Total Dipole (Debye)
> 3.03524e+02 2.43084e+02 2.30363e+02
>
> Epot (kJ/mol) Coul-SR LJ-SR LJ-LR
> Coul-14 LJ-14
> Protein-Protein 5.44055e+02 1.91174e+02 4.60987e+00
> 7.63636e+02 8.27177e+02
> Protein-Non-Protein 2.99090e+02 2.19500e+02 7.77253e+00
> 0.00000e+00 0.00000e+00
> Non-Protein-Non-Protein 2.32076e+04 2.01585e+04 3.30115e+01
> 0.00000e+00 0.00000e+00
>
> T-Protein T-SOL T-CL-
> 6.32406e+01 1.61761e+01 6.73077e+01
>
> ----------------------------------------------------------------------------
> Cluster OLD system 2:
> Step Time Lambda
> 50000 100.00001 0.00000
>
> Rel. Constraint Deviation: Max between atoms RMS
> Before LINCS 0.062015 369 370 0.007971
> After LINCS 0.000087 231 233 0.000021
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 2.70775e+03 1.04157e+03 7.93145e+02 5.36208e+02 1.91215e+04
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 7.67921e+04 -1.45280e+03 -4.98410e+05 -6.19374e+04 6.32109e+02
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> -4.60176e+05 7.54371e+04 -3.84739e+05 3.01252e+02 -1.61958e+02
>
>
> Total NODE time on node 0: 2449.05
> Average NODE time: 306.131
> Load imbalance reduced performance to 800% of max
> <====== ############### ==>
> <==== A V E R A G E S ====>
> <== ############### ======>
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 2.65581e+03 1.10522e+03 8.58218e+02 5.56582e+02 1.91397e+04
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 7.76235e+04 -1.45624e+03 -4.98446e+05 -6.19220e+04 6.48663e+02
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> -4.59237e+05 7.52388e+04 -3.83998e+05 3.00460e+02 2.82522e-01
>
> Box-X Box-Y Box-Z Volume Density (SI)
> 7.36717e+00 6.94587e+00 6.01552e+00 3.07828e+02 1.02446e+03
> pV
> -2.19380e+01
>
> Total Virial (kJ/mol)
> 2.50625e+04 3.74364e+01 -1.09807e+02
> -1.21909e+02 2.52114e+04 -2.24475e+01
> -1.09718e+02 6.28763e+01 2.49978e+04
>
> Pressure (bar)
> 5.85471e+00 -9.46889e-01 1.42992e+01
> 1.55935e+01 -1.28647e+01 5.65603e+00
> 1.42304e+01 -3.13115e+00 7.85754e+00
>
> Total Dipole (Debye)
> -4.27471e+02 1.56256e+03 1.39198e+02
>
> Epot (kJ/mol) Coul-SR LJ-SR LJ-LR
> Coul-14 LJ-14
> Protein-Protein -6.43284e+03 -6.06852e+03 -1.93594e+02
> 1.91397e+04 5.56582e+02
> Protein-Non-Protein -6.07176e+03 -1.49620e+03 -1.99600e+02
> 0.00000e+00 0.00000e+00
> Non-Protein-Non-Protein -4.85942e+05 8.51882e+04 -1.06305e+03
> 0.00000e+00 0.00000e+00
>
> T-Protein T-SOL T-CL-
> 2.99892e+02 3.00493e+02 3.02237e+02
>
> <====== ############################### ==>
> <==== R M S - F L U C T U A T I O N S ====>
> <== ############################### ======>
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 7.87191e+01 3.92754e+01 3.97988e+01 4.07835e+01 6.59014e+01
> LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest.
> 1.42174e+03 9.85270e+00 3.66032e+03 1.38822e+02 5.13795e+01
> Potential Kinetic En. Total Energy Temperature Pressure (bar)
> 2.79433e+03 1.14191e+03 3.71906e+03 4.56013e+00 4.15482e+02
>
> Box-X Box-Y Box-Z Volume Density (SI)
> 1.73750e-02 1.63839e-02 1.41834e-02 2.20002e+00 7.10821e+00
> pV
> 7.74026e+03
>
> Total Virial (kJ/mol)
> 4.40353e+03 3.95656e+03 3.53923e+03
> 4.91163e+03 6.62942e+03 4.84215e+03
> 3.02705e+03 3.46706e+03 3.99411e+03
>
> Pressure (bar)
> 4.80956e+02 4.27103e+02 3.82114e+02
> 5.31242e+02 7.15522e+02 5.21285e+02
> 3.27257e+02 3.74634e+02 4.39706e+02
>
> Total Dipole (Debye)
> 2.65079e+02 3.01101e+02 2.66824e+02
>
> Epot (kJ/mol) Coul-SR LJ-SR LJ-LR
> Coul-14 LJ-14
> Protein-Protein 6.55075e+01 4.63548e+01 6.28258e-01
> 6.59014e+01 4.07835e+01
> Protein-Non-Protein 2.40165e+02 1.09021e+02 3.92781e+00
> 0.00000e+00 0.00000e+00
> Non-Protein-Non-Protein 3.50328e+03 1.43654e+03 6.95241e+00
> 0.00000e+00 0.00000e+00
>
> T-Protein T-SOL T-CL-
> 5.26064e+00 4.78751e+00 5.93233e+01
> ----------------------------------------------------------------------------
>
> What could be the origin of such discrepancies between 3.3.1 and 4.0.3/4?
> Is any change in MD protocol strongly suggested on converting input/script
> files from 3.3 to 4.0?
>
> I searched Gromacs mailing-lists and docs, but I could not identify any
> useful hint or other cases of the same problem, so I apologize in advance
> if I may have missed this information.
>
> Best regards,
> Pietro
>
That was a long mail. How about T-coupling? Which algorithm did you use?
Did you do a diff on the md.log to check for differences in the mdp
parameters?
Did you run these in parallel? What happens when you run it
sequentially? And what happens in single precision?
--
David van der Spoel, Ph.D., Professor of Biology
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205. Fax: +4618511755.
spoel at xray.bmc.uu.se spoel at gromacs.org http://folding.bmc.uu.se
More information about the gromacs.org_gmx-users
mailing list