[gmx-users] Energy/temperature drifts in Gromacs 4.0 / inconsistencies with Gromacs 3.3.1

David van der Spoel spoel at xray.bmc.uu.se
Fri Mar 13 17:51:38 CET 2009


Pietro Amodeo wrote:
> Hi Gromacs users/developers,
> 
> we have two Gromacs installations on two different clusters with the
> following sw versions:
> 
> 1) Cluster: OLD(Myrinet)
>    Gromacs 3.3.1
>    (CentOS 4 / Rocks 4.1)
>    kernel 2.6.9-22.ELsmp
>    gcc 3.4.4
>    fftw 3.1.2
>    mpich-gm 1.2.7p1..18
> 
> 2) Cluster: NEW(Infiniband)
>    Gromacs 4.0.4 / 4.0.3
>    (CentOS 5)
>    kernel 2.6.18-53.el5
>    gcc 4.1.2 20070626 (Red Hat 4.1.2-14) / icc 10.1 (Build 20070913
> Pack.ID: l_cc_p_10.1.008)
>    fftw 3.2.1
>    ofed131 - openmpi 1.2.6
> 
> Both serial and parallel, both single- and double-precision versions of
> Gromacs 4.0.3 and 4.0.4 were compiled with gcc (deprecated 4.1.2, but
> tests were either passed or failed with minor discrepancies) and with
> Intel 10.1 compilers).
> 
> We tried to reproduce on cluster NEW simple MD equilibrations on two
> different systems (proteins solvated in SPC water + counterions)
> successfully run on cluster OLD.  We used as starting tpr files either the
> same ones used and produced in 3.3.1, or new 4.0.4 files.
> Although the starting energies for both systems were substantially equal:
> ------------------------------------------------------------------------------------------------------
> Cluster NEW system 2:
>            Step           Time         Lambda
>               0        0.00000        0.00000
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     7.90343e+02    7.80369e+02    2.11086e+02    4.58020e+02    1.92904e+04
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     1.09286e+05   -1.43221e+03   -4.54033e+05   -5.15749e+04    2.74030e-01
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>    -3.76224e+05    7.83268e+04   -2.97897e+05    3.12792e+02    9.77797e+03
>   Cons. rmsd ()
>     2.19464e-05
> ------------------------------------------------------------------------------------------------------
> Cluster OLD system 2:
>    Rel. Constraint Deviation:  Max    between atoms     RMS
>        Before LINCS         0.098014   1670   1671   0.006831
>         After LINCS         0.000104    509    511   0.000022
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     7.90348e+02    7.80369e+02    2.11085e+02    4.58017e+02    1.92904e+04
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     1.09286e+05   -1.43221e+03   -4.54033e+05   -5.15750e+04    2.74027e-01
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>    -3.76224e+05    7.83267e+04   -2.97897e+05    3.12791e+02    9.78296e+03
> ------------------------------------------------------------------------------------------------------
> in both cases the simulations with Gromacs 3.3.1 ran without any problem
> (and provided good starting points for very stable production runs), while
> those performed with Gromacs 4.0.3 or 4.0.4 after 2 ps or less
> systematically started exhibiting total energy and temperature wide
> oscillations with a net increasing drift in energy on both systems, and
> very rapidly increasing temperature variations in system 1, that led to
> premature run terminations with errors on LINCS or routines to calculate
> 1-4 interactions for all runs on system 1. System 2 exhibited a smaller
> energy drift and rather steady, but still significant, temperature
> oscillations, so the 100 ps run (8 cores, double-precision parallel
> version complied with Intel compiler, starting from original 3.3.1 tpr
> file) ended (apparently) regularly.
> However, avg. energy was higher than in corresponding 3.3.1 simulation and
> avg. temperature failed to reach the targeted 300K value. In particular
> protein suffered from poor thermal relaxation under the same conditions
> that in 3.3.1 simulations worked flawlessly.
> The final, average and r.m.s. values from log files of the two
> corresponding runs on system 2 with 3.3.1 and 4.0.4 are:
> 
> ----------------------------------------------------------------------------
> Cluster NEW system 2:
>            Step           Time         Lambda
>           50000      100.00000        0.00000
> 
> Writing checkpoint, step 50000 at Thu Mar 12 16:21:03 2009
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     5.20200e+03    1.40303e+03    1.56905e+03    5.80966e+02    1.89786e+04
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     7.93006e+04   -1.43727e+03   -4.96019e+05   -6.18250e+04    2.12927e+03
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>    -4.50118e+05    8.53539e+04   -3.64764e+05    3.40853e+02    7.37194e+03
>   Cons. rmsd ()
>     6.33529e-05
> 
>         <======  ###############  ==>
>         <====  A V E R A G E S  ====>
>         <==  ###############  ======>
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     5.22068e+03    1.38036e+03    1.53871e+03    1.11820e+03    1.90723e+04
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     7.38412e+04   -1.40961e+03   -4.82837e+05   -6.16567e+04    4.79992e+03
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>    -4.38932e+05    8.23804e+04   -3.56552e+05    3.28979e+02    2.00692e+02
>   Cons. rmsd ()
>     0.00000e+00
> 
>           Box-X          Box-Y          Box-Z         Volume   Density (SI)
>     7.45269e+00    7.02652e+00    6.08533e+00    3.18779e+02    9.90264e+02
>              pV
>    -6.99836e+03
> 
>    Total Virial (kJ/mol)
>     3.09153e+04    3.94735e+01   -3.38472e+00
>     3.94745e+01    3.11413e+04    2.26993e+02
>    -3.38879e+00    2.26991e+02    3.08214e+04
> 
>    Pressure (bar)
>     1.19126e+02    3.73568e+00    8.31997e+00
>     3.73558e+00    4.54768e+02   -1.26308e+01
>     8.32041e+00   -1.26307e+01    2.81831e+01
> 
>    Total Dipole (Debye)
>     1.44566e+02    4.10437e+02    1.05756e+02
> 
>   Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
> Coul-14          LJ-14
> Protein-Protein   -6.23914e+03   -5.96547e+03   -1.93349e+02   
> 1.90723e+04    1.11820e+03
> Protein-Non-Protein   -5.33140e+03   -1.38636e+03   -1.87522e+02   
> 0.00000e+00    0.00000e+00
> Non-Protein-Non-Protein   -4.71266e+05    8.11930e+04   -1.02874e+03   
> 0.00000e+00    0.00000e+00
> 
>       T-Protein          T-SOL          T-CL-
>     5.93312e+02    3.12714e+02    3.24327e+02
> 
>         <======  ###############################  ==>
>         <====  R M S - F L U C T U A T I O N S  ====>
>         <==  ###############################  ======>
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     9.66238e+02    1.07074e+02    1.99039e+02    8.27177e+02    7.63636e+02
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     2.05222e+04    4.49488e+01    2.39437e+04    2.70027e+02    3.11921e+03
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>     7.21911e+03    4.02779e+03    6.92839e+03    1.60846e+01    1.78298e+04
>   Cons. rmsd ()
>     0.00000e+00
> 
>           Box-X          Box-Y          Box-Z         Volume   Density (SI)
>     8.06489e-02    7.60371e-02    6.58520e-02    1.03515e+01    3.21181e+01
>              pV
>     3.41845e+05
> 
>    Total Virial (kJ/mol)
>     1.45509e+05    1.37533e+03    1.65462e+03
>     1.37534e+03    2.49318e+05    1.79716e+03
>     1.65462e+03    1.79717e+03    1.14997e+05
> 
>    Pressure (bar)
>     1.52957e+04    1.45547e+02    1.74844e+02
>     1.45548e+02    2.61231e+04    1.90420e+02
>     1.74843e+02    1.90421e+02    1.21129e+04
> 
>    Total Dipole (Debye)
>     3.03524e+02    2.43084e+02    2.30363e+02
> 
>   Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
> Coul-14          LJ-14
> Protein-Protein    5.44055e+02    1.91174e+02    4.60987e+00   
> 7.63636e+02    8.27177e+02
> Protein-Non-Protein    2.99090e+02    2.19500e+02    7.77253e+00   
> 0.00000e+00    0.00000e+00
> Non-Protein-Non-Protein    2.32076e+04    2.01585e+04    3.30115e+01   
> 0.00000e+00    0.00000e+00
> 
>       T-Protein          T-SOL          T-CL-
>     6.32406e+01    1.61761e+01    6.73077e+01
> 
> ----------------------------------------------------------------------------
> Cluster OLD system 2:
>            Step           Time         Lambda
>           50000      100.00001        0.00000
> 
>    Rel. Constraint Deviation:  Max    between atoms     RMS
>        Before LINCS         0.062015    369    370   0.007971
>         After LINCS         0.000087    231    233   0.000021
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     2.70775e+03    1.04157e+03    7.93145e+02    5.36208e+02    1.91215e+04
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     7.67921e+04   -1.45280e+03   -4.98410e+05   -6.19374e+04    6.32109e+02
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>    -4.60176e+05    7.54371e+04   -3.84739e+05    3.01252e+02   -1.61958e+02
> 
> 
> Total NODE time on node 0: 2449.05
> Average NODE time: 306.131
> Load imbalance reduced performance to 800% of max
>         <======  ###############  ==>
>         <====  A V E R A G E S  ====>
>         <==  ###############  ======>
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     2.65581e+03    1.10522e+03    8.58218e+02    5.56582e+02    1.91397e+04
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     7.76235e+04   -1.45624e+03   -4.98446e+05   -6.19220e+04    6.48663e+02
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>    -4.59237e+05    7.52388e+04   -3.83998e+05    3.00460e+02    2.82522e-01
> 
>           Box-X          Box-Y          Box-Z         Volume   Density (SI)
>     7.36717e+00    6.94587e+00    6.01552e+00    3.07828e+02    1.02446e+03
>              pV
>    -2.19380e+01
> 
>    Total Virial (kJ/mol)
>     2.50625e+04    3.74364e+01   -1.09807e+02
>    -1.21909e+02    2.52114e+04   -2.24475e+01
>    -1.09718e+02    6.28763e+01    2.49978e+04
> 
>    Pressure (bar)
>     5.85471e+00   -9.46889e-01    1.42992e+01
>     1.55935e+01   -1.28647e+01    5.65603e+00
>     1.42304e+01   -3.13115e+00    7.85754e+00
> 
>    Total Dipole (Debye)
>    -4.27471e+02    1.56256e+03    1.39198e+02
> 
>   Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
> Coul-14          LJ-14
> Protein-Protein   -6.43284e+03   -6.06852e+03   -1.93594e+02   
> 1.91397e+04    5.56582e+02
> Protein-Non-Protein   -6.07176e+03   -1.49620e+03   -1.99600e+02   
> 0.00000e+00    0.00000e+00
> Non-Protein-Non-Protein   -4.85942e+05    8.51882e+04   -1.06305e+03   
> 0.00000e+00    0.00000e+00
> 
>       T-Protein          T-SOL          T-CL-
>     2.99892e+02    3.00493e+02    3.02237e+02
> 
>         <======  ###############################  ==>
>         <====  R M S - F L U C T U A T I O N S  ====>
>         <==  ###############################  ======>
> 
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     7.87191e+01    3.92754e+01    3.97988e+01    4.07835e+01    6.59014e+01
>         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
>     1.42174e+03    9.85270e+00    3.66032e+03    1.38822e+02    5.13795e+01
>       Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
>     2.79433e+03    1.14191e+03    3.71906e+03    4.56013e+00    4.15482e+02
> 
>           Box-X          Box-Y          Box-Z         Volume   Density (SI)
>     1.73750e-02    1.63839e-02    1.41834e-02    2.20002e+00    7.10821e+00
>              pV
>     7.74026e+03
> 
>    Total Virial (kJ/mol)
>     4.40353e+03    3.95656e+03    3.53923e+03
>     4.91163e+03    6.62942e+03    4.84215e+03
>     3.02705e+03    3.46706e+03    3.99411e+03
> 
>    Pressure (bar)
>     4.80956e+02    4.27103e+02    3.82114e+02
>     5.31242e+02    7.15522e+02    5.21285e+02
>     3.27257e+02    3.74634e+02    4.39706e+02
> 
>    Total Dipole (Debye)
>     2.65079e+02    3.01101e+02    2.66824e+02
> 
>   Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
> Coul-14          LJ-14
> Protein-Protein    6.55075e+01    4.63548e+01    6.28258e-01   
> 6.59014e+01    4.07835e+01
> Protein-Non-Protein    2.40165e+02    1.09021e+02    3.92781e+00   
> 0.00000e+00    0.00000e+00
> Non-Protein-Non-Protein    3.50328e+03    1.43654e+03    6.95241e+00   
> 0.00000e+00    0.00000e+00
> 
>       T-Protein          T-SOL          T-CL-
>     5.26064e+00    4.78751e+00    5.93233e+01
> ----------------------------------------------------------------------------
> 
> What could be the origin of such discrepancies between 3.3.1 and 4.0.3/4?
> Is any change in MD protocol strongly suggested on converting input/script
> files from 3.3 to 4.0?
> 
> I searched Gromacs mailing-lists and docs, but I could not identify any
> useful hint or other cases of the same problem, so I apologize in advance
> if I may have missed this information.
> 
> Best regards,
> Pietro
> 
That was a long mail. How about T-coupling? Which algorithm did you use?
Did you do a diff on the md.log to check for differences in the mdp 
parameters?
Did you run these in parallel? What happens when you run it 
sequentially? And what happens in single precision?

-- 
David van der Spoel, Ph.D., Professor of Biology
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205. Fax: +4618511755.
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://folding.bmc.uu.se



More information about the gromacs.org_gmx-users mailing list