[gmx-users] Energy/temperature drifts in Gromacs 4.0 / inconsistencies with Gromacs 3.3.1

Pietro Amodeo pamodeo at icmib.na.cnr.it
Fri Mar 13 17:07:33 CET 2009


Hi Gromacs users/developers,

we have two Gromacs installations on two different clusters with the
following sw versions:

1) Cluster: OLD(Myrinet)
   Gromacs 3.3.1
   (CentOS 4 / Rocks 4.1)
   kernel 2.6.9-22.ELsmp
   gcc 3.4.4
   fftw 3.1.2
   mpich-gm 1.2.7p1..18

2) Cluster: NEW(Infiniband)
   Gromacs 4.0.4 / 4.0.3
   (CentOS 5)
   kernel 2.6.18-53.el5
   gcc 4.1.2 20070626 (Red Hat 4.1.2-14) / icc 10.1 (Build 20070913
Pack.ID: l_cc_p_10.1.008)
   fftw 3.2.1
   ofed131 - openmpi 1.2.6

Both serial and parallel, both single- and double-precision versions of
Gromacs 4.0.3 and 4.0.4 were compiled with gcc (deprecated 4.1.2, but
tests were either passed or failed with minor discrepancies) and with
Intel 10.1 compilers).

We tried to reproduce on cluster NEW simple MD equilibrations on two
different systems (proteins solvated in SPC water + counterions)
successfully run on cluster OLD.  We used as starting tpr files either the
same ones used and produced in 3.3.1, or new 4.0.4 files.
Although the starting energies for both systems were substantially equal:
------------------------------------------------------------------------------------------------------
Cluster NEW system 2:
           Step           Time         Lambda
              0        0.00000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    7.90343e+02    7.80369e+02    2.11086e+02    4.58020e+02    1.92904e+04
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    1.09286e+05   -1.43221e+03   -4.54033e+05   -5.15749e+04    2.74030e-01
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
   -3.76224e+05    7.83268e+04   -2.97897e+05    3.12792e+02    9.77797e+03
  Cons. rmsd ()
    2.19464e-05
------------------------------------------------------------------------------------------------------
Cluster OLD system 2:
   Rel. Constraint Deviation:  Max    between atoms     RMS
       Before LINCS         0.098014   1670   1671   0.006831
        After LINCS         0.000104    509    511   0.000022

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    7.90348e+02    7.80369e+02    2.11085e+02    4.58017e+02    1.92904e+04
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    1.09286e+05   -1.43221e+03   -4.54033e+05   -5.15750e+04    2.74027e-01
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
   -3.76224e+05    7.83267e+04   -2.97897e+05    3.12791e+02    9.78296e+03
------------------------------------------------------------------------------------------------------
in both cases the simulations with Gromacs 3.3.1 ran without any problem
(and provided good starting points for very stable production runs), while
those performed with Gromacs 4.0.3 or 4.0.4 after 2 ps or less
systematically started exhibiting total energy and temperature wide
oscillations with a net increasing drift in energy on both systems, and
very rapidly increasing temperature variations in system 1, that led to
premature run terminations with errors on LINCS or routines to calculate
1-4 interactions for all runs on system 1. System 2 exhibited a smaller
energy drift and rather steady, but still significant, temperature
oscillations, so the 100 ps run (8 cores, double-precision parallel
version complied with Intel compiler, starting from original 3.3.1 tpr
file) ended (apparently) regularly.
However, avg. energy was higher than in corresponding 3.3.1 simulation and
avg. temperature failed to reach the targeted 300K value. In particular
protein suffered from poor thermal relaxation under the same conditions
that in 3.3.1 simulations worked flawlessly.
The final, average and r.m.s. values from log files of the two
corresponding runs on system 2 with 3.3.1 and 4.0.4 are:

----------------------------------------------------------------------------
Cluster NEW system 2:
           Step           Time         Lambda
          50000      100.00000        0.00000

Writing checkpoint, step 50000 at Thu Mar 12 16:21:03 2009

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    5.20200e+03    1.40303e+03    1.56905e+03    5.80966e+02    1.89786e+04
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    7.93006e+04   -1.43727e+03   -4.96019e+05   -6.18250e+04    2.12927e+03
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
   -4.50118e+05    8.53539e+04   -3.64764e+05    3.40853e+02    7.37194e+03
  Cons. rmsd ()
    6.33529e-05

        <======  ###############  ==>
        <====  A V E R A G E S  ====>
        <==  ###############  ======>

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    5.22068e+03    1.38036e+03    1.53871e+03    1.11820e+03    1.90723e+04
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    7.38412e+04   -1.40961e+03   -4.82837e+05   -6.16567e+04    4.79992e+03
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
   -4.38932e+05    8.23804e+04   -3.56552e+05    3.28979e+02    2.00692e+02
  Cons. rmsd ()
    0.00000e+00

          Box-X          Box-Y          Box-Z         Volume   Density (SI)
    7.45269e+00    7.02652e+00    6.08533e+00    3.18779e+02    9.90264e+02
             pV
   -6.99836e+03

   Total Virial (kJ/mol)
    3.09153e+04    3.94735e+01   -3.38472e+00
    3.94745e+01    3.11413e+04    2.26993e+02
   -3.38879e+00    2.26991e+02    3.08214e+04

   Pressure (bar)
    1.19126e+02    3.73568e+00    8.31997e+00
    3.73558e+00    4.54768e+02   -1.26308e+01
    8.32041e+00   -1.26307e+01    2.81831e+01

   Total Dipole (Debye)
    1.44566e+02    4.10437e+02    1.05756e+02

  Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
Coul-14          LJ-14
Protein-Protein   -6.23914e+03   -5.96547e+03   -1.93349e+02   
1.90723e+04    1.11820e+03
Protein-Non-Protein   -5.33140e+03   -1.38636e+03   -1.87522e+02   
0.00000e+00    0.00000e+00
Non-Protein-Non-Protein   -4.71266e+05    8.11930e+04   -1.02874e+03   
0.00000e+00    0.00000e+00

      T-Protein          T-SOL          T-CL-
    5.93312e+02    3.12714e+02    3.24327e+02

        <======  ###############################  ==>
        <====  R M S - F L U C T U A T I O N S  ====>
        <==  ###############################  ======>

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    9.66238e+02    1.07074e+02    1.99039e+02    8.27177e+02    7.63636e+02
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    2.05222e+04    4.49488e+01    2.39437e+04    2.70027e+02    3.11921e+03
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
    7.21911e+03    4.02779e+03    6.92839e+03    1.60846e+01    1.78298e+04
  Cons. rmsd ()
    0.00000e+00

          Box-X          Box-Y          Box-Z         Volume   Density (SI)
    8.06489e-02    7.60371e-02    6.58520e-02    1.03515e+01    3.21181e+01
             pV
    3.41845e+05

   Total Virial (kJ/mol)
    1.45509e+05    1.37533e+03    1.65462e+03
    1.37534e+03    2.49318e+05    1.79716e+03
    1.65462e+03    1.79717e+03    1.14997e+05

   Pressure (bar)
    1.52957e+04    1.45547e+02    1.74844e+02
    1.45548e+02    2.61231e+04    1.90420e+02
    1.74843e+02    1.90421e+02    1.21129e+04

   Total Dipole (Debye)
    3.03524e+02    2.43084e+02    2.30363e+02

  Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
Coul-14          LJ-14
Protein-Protein    5.44055e+02    1.91174e+02    4.60987e+00   
7.63636e+02    8.27177e+02
Protein-Non-Protein    2.99090e+02    2.19500e+02    7.77253e+00   
0.00000e+00    0.00000e+00
Non-Protein-Non-Protein    2.32076e+04    2.01585e+04    3.30115e+01   
0.00000e+00    0.00000e+00

      T-Protein          T-SOL          T-CL-
    6.32406e+01    1.61761e+01    6.73077e+01

----------------------------------------------------------------------------
Cluster OLD system 2:
           Step           Time         Lambda
          50000      100.00001        0.00000

   Rel. Constraint Deviation:  Max    between atoms     RMS
       Before LINCS         0.062015    369    370   0.007971
        After LINCS         0.000087    231    233   0.000021

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    2.70775e+03    1.04157e+03    7.93145e+02    5.36208e+02    1.91215e+04
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    7.67921e+04   -1.45280e+03   -4.98410e+05   -6.19374e+04    6.32109e+02
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
   -4.60176e+05    7.54371e+04   -3.84739e+05    3.01252e+02   -1.61958e+02


Total NODE time on node 0: 2449.05
Average NODE time: 306.131
Load imbalance reduced performance to 800% of max
        <======  ###############  ==>
        <====  A V E R A G E S  ====>
        <==  ###############  ======>

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    2.65581e+03    1.10522e+03    8.58218e+02    5.56582e+02    1.91397e+04
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    7.76235e+04   -1.45624e+03   -4.98446e+05   -6.19220e+04    6.48663e+02
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
   -4.59237e+05    7.52388e+04   -3.83998e+05    3.00460e+02    2.82522e-01

          Box-X          Box-Y          Box-Z         Volume   Density (SI)
    7.36717e+00    6.94587e+00    6.01552e+00    3.07828e+02    1.02446e+03
             pV
   -2.19380e+01

   Total Virial (kJ/mol)
    2.50625e+04    3.74364e+01   -1.09807e+02
   -1.21909e+02    2.52114e+04   -2.24475e+01
   -1.09718e+02    6.28763e+01    2.49978e+04

   Pressure (bar)
    5.85471e+00   -9.46889e-01    1.42992e+01
    1.55935e+01   -1.28647e+01    5.65603e+00
    1.42304e+01   -3.13115e+00    7.85754e+00

   Total Dipole (Debye)
   -4.27471e+02    1.56256e+03    1.39198e+02

  Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
Coul-14          LJ-14
Protein-Protein   -6.43284e+03   -6.06852e+03   -1.93594e+02   
1.91397e+04    5.56582e+02
Protein-Non-Protein   -6.07176e+03   -1.49620e+03   -1.99600e+02   
0.00000e+00    0.00000e+00
Non-Protein-Non-Protein   -4.85942e+05    8.51882e+04   -1.06305e+03   
0.00000e+00    0.00000e+00

      T-Protein          T-SOL          T-CL-
    2.99892e+02    3.00493e+02    3.02237e+02

        <======  ###############################  ==>
        <====  R M S - F L U C T U A T I O N S  ====>
        <==  ###############################  ======>

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    7.87191e+01    3.92754e+01    3.97988e+01    4.07835e+01    6.59014e+01
        LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip. Position Rest.
    1.42174e+03    9.85270e+00    3.66032e+03    1.38822e+02    5.13795e+01
      Potential    Kinetic En.   Total Energy    Temperature Pressure (bar)
    2.79433e+03    1.14191e+03    3.71906e+03    4.56013e+00    4.15482e+02

          Box-X          Box-Y          Box-Z         Volume   Density (SI)
    1.73750e-02    1.63839e-02    1.41834e-02    2.20002e+00    7.10821e+00
             pV
    7.74026e+03

   Total Virial (kJ/mol)
    4.40353e+03    3.95656e+03    3.53923e+03
    4.91163e+03    6.62942e+03    4.84215e+03
    3.02705e+03    3.46706e+03    3.99411e+03

   Pressure (bar)
    4.80956e+02    4.27103e+02    3.82114e+02
    5.31242e+02    7.15522e+02    5.21285e+02
    3.27257e+02    3.74634e+02    4.39706e+02

   Total Dipole (Debye)
    2.65079e+02    3.01101e+02    2.66824e+02

  Epot (kJ/mol)        Coul-SR          LJ-SR          LJ-LR       
Coul-14          LJ-14
Protein-Protein    6.55075e+01    4.63548e+01    6.28258e-01   
6.59014e+01    4.07835e+01
Protein-Non-Protein    2.40165e+02    1.09021e+02    3.92781e+00   
0.00000e+00    0.00000e+00
Non-Protein-Non-Protein    3.50328e+03    1.43654e+03    6.95241e+00   
0.00000e+00    0.00000e+00

      T-Protein          T-SOL          T-CL-
    5.26064e+00    4.78751e+00    5.93233e+01
----------------------------------------------------------------------------

What could be the origin of such discrepancies between 3.3.1 and 4.0.3/4?
Is any change in MD protocol strongly suggested on converting input/script
files from 3.3 to 4.0?

I searched Gromacs mailing-lists and docs, but I could not identify any
useful hint or other cases of the same problem, so I apologize in advance
if I may have missed this information.

Best regards,
Pietro

-- 
Dr. Pietro Amodeo, PhD.
Istituto di Chimica Biomolecolare del CNR
Comprensorio "A. Olivetti", Edificio 70
Via Campi Flegrei 34
I-80078 Pozzuoli (Napoli) - Italy
Phone      +39-0818675072
Fax        +39-0818041770
Email    pamodeo at icmib.na.cnr.it




More information about the gromacs.org_gmx-users mailing list