[gmx-users] Problem on continuing MD

YanhuaOuyang 15901283893 at 163.com
Wed Nov 1 10:20:35 CET 2017


Dear gromacs user,


       Today, I continue the MD twice in two directories from the same point of the MD trajectory, for example 100ns, using the same CPU, same checkpoint file, same serve node. To my surprise, the energy informations are different between the two continued log ouput files, which are shown below. 


continue_md_01.log:

Started mdrun on rank 0 Wed Nov  1 23:17:08 2017

           Step           Time         Lambda

       50000000   100000.00000        0.00000

   Energies (kJ/mol)

           Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.

    4.67507e+02    1.47390e+03    1.44019e+03    6.93280e+01    8.29478e+01

          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)

    4.17297e+02    9.82076e+03    1.12930e+05   -8.68735e+03   -1.15522e+06

   Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.

    5.88613e+03   -1.03132e+06    1.63796e+05   -8.67527e+05   -1.90771e+05

    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

    2.82366e+02   -2.04258e+02   -7.74514e+02    2.31539e-06

DD  step 50000019 load imb.: force 29.5%  pme mesh/force 0.964

At step 50000020 the performance loss due to force load imbalance is 11.1 %

           Step           Time         Lambda

       50001000   100002.00000        0.00000

   Energies (kJ/mol)

           Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.

    4.65447e+02    1.50124e+03    1.50444e+03    7.92082e+01    1.64421e+01

          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)

    4.18616e+02    9.80230e+03    1.11198e+05   -8.68735e+03   -1.15433e+06

   Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.

    5.95908e+03   -1.03208e+06    1.64017e+05   -8.68059e+05   -1.90761e+05

    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

    2.82747e+02   -2.04258e+02   -9.31332e+02    3.06123e-06

DD  step 50001999  vol min/aver 0.880  load imb.: force 10.0%  pme mesh/force 1.059

    ...




continue_md_02.log:

Started mdrun on rank 0 Wed Nov  1 23:39:51 2017

           Step           Time         Lambda

       50000000   100000.00000        0.00000

   Energies (kJ/mol)

           Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.

    4.67507e+02    1.47390e+03    1.44019e+03    6.93280e+01    8.29478e+01

          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)

    4.17297e+02    9.82076e+03    1.12930e+05   -8.68735e+03   -1.15522e+06

   Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.

    5.88613e+03   -1.03132e+06    1.63796e+05   -8.67527e+05   -1.90771e+05

    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

    2.82366e+02   -2.04258e+02   -7.74505e+02    2.31539e-06

DD  step 50000019 load imb.: force 18.3%  pme mesh/force 0.950

At step 50000020 the performance loss due to force load imbalance is 6.9 %

           Step           Time         Lambda

       50001000   100002.00000        0.00000

   Energies (kJ/mol)

           Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.

    4.51321e+02    1.43914e+03    1.56368e+03    9.15439e+01    1.64274e+01

          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)

    4.07426e+02    9.84156e+03    1.12168e+05   -8.68735e+03   -1.15440e+06

   Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.

    5.88309e+03   -1.03123e+06    1.62769e+05   -8.68456e+05   -1.90745e+05

    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

2.80597e+02   -2.04258e+02   -8.80279e+02    2.64599e-06

DD  step 50001999  vol min/aver 0.905  load imb.: force 32.7%  pme mesh/force 1.034

       ...




It is obviously shown that the energy informations varied from 100002ps (the md is continued from 100ns). Generally speaking, the two continued MD should be same each other since the conditions are same.
Why are they different? Does it mean the MD can not be terminated or transfered from one server to another because they are changeable if we want to investigate the dynamic property?
Do anyone knows the problems?




Best regards,
Ouyang.








More information about the gromacs.org_gmx-users mailing list