[gmx-users] Problem on continuing MD

Mark Abraham mark.j.abraham at gmail.com
Wed Nov 1 10:39:30 CET 2017


Hi,

See http://www.gromacs.org/Documentation/Terminology/Reproducibility. You
have non-reproducible load balancing. However this is not a problem unless
your experimental design hinges upon being able to reproduce an exact
trajectory (in which case you will have a tough time getting performance).
You would get a different trajectory if you rotated your initial system by
90 degrees too. Which one is "right?"

Mark

On Wed, Nov 1, 2017 at 10:20 AM YanhuaOuyang <15901283893 at 163.com> wrote:

> Dear gromacs user,
>
>
>        Today, I continue the MD twice in two directories from the same
> point of the MD trajectory, for example 100ns, using the same CPU, same
> checkpoint file, same serve node. To my surprise, the energy informations
> are different between the two continued log ouput files, which are shown
> below.
>
>
> continue_md_01.log:
>
> Started mdrun on rank 0 Wed Nov  1 23:17:08 2017
>
>            Step           Time         Lambda
>
>        50000000   100000.00000        0.00000
>
>    Energies (kJ/mol)
>
>            Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
>
>     4.67507e+02    1.47390e+03    1.44019e+03    6.93280e+01    8.29478e+01
>
>           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
>
>     4.17297e+02    9.82076e+03    1.12930e+05   -8.68735e+03   -1.15522e+06
>
>    Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.
>
>     5.88613e+03   -1.03132e+06    1.63796e+05   -8.67527e+05   -1.90771e+05
>
>     Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
>     2.82366e+02   -2.04258e+02   -7.74514e+02    2.31539e-06
>
> DD  step 50000019 load imb.: force 29.5%  pme mesh/force 0.964
>
> At step 50000020 the performance loss due to force load imbalance is 11.1 %
>
>            Step           Time         Lambda
>
>        50001000   100002.00000        0.00000
>
>    Energies (kJ/mol)
>
>            Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
>
>     4.65447e+02    1.50124e+03    1.50444e+03    7.92082e+01    1.64421e+01
>
>           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
>
>     4.18616e+02    9.80230e+03    1.11198e+05   -8.68735e+03   -1.15433e+06
>
>    Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.
>
>     5.95908e+03   -1.03208e+06    1.64017e+05   -8.68059e+05   -1.90761e+05
>
>     Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
>     2.82747e+02   -2.04258e+02   -9.31332e+02    3.06123e-06
>
> DD  step 50001999  vol min/aver 0.880  load imb.: force 10.0%  pme
> mesh/force 1.059
>
>     ...
>
>
>
>
> continue_md_02.log:
>
> Started mdrun on rank 0 Wed Nov  1 23:39:51 2017
>
>            Step           Time         Lambda
>
>        50000000   100000.00000        0.00000
>
>    Energies (kJ/mol)
>
>            Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
>
>     4.67507e+02    1.47390e+03    1.44019e+03    6.93280e+01    8.29478e+01
>
>           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
>
>     4.17297e+02    9.82076e+03    1.12930e+05   -8.68735e+03   -1.15522e+06
>
>    Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.
>
>     5.88613e+03   -1.03132e+06    1.63796e+05   -8.67527e+05   -1.90771e+05
>
>     Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
>     2.82366e+02   -2.04258e+02   -7.74505e+02    2.31539e-06
>
> DD  step 50000019 load imb.: force 18.3%  pme mesh/force 0.950
>
> At step 50000020 the performance loss due to force load imbalance is 6.9 %
>
>            Step           Time         Lambda
>
>        50001000   100002.00000        0.00000
>
>    Energies (kJ/mol)
>
>            Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
>
>     4.51321e+02    1.43914e+03    1.56368e+03    9.15439e+01    1.64274e+01
>
>           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
>
>     4.07426e+02    9.84156e+03    1.12168e+05   -8.68735e+03   -1.15440e+06
>
>    Coul. recip.      Potential    Kinetic En.   Total Energy  Conserved En.
>
>     5.88309e+03   -1.03123e+06    1.62769e+05   -8.68456e+05   -1.90745e+05
>
>     Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
> 2.80597e+02   -2.04258e+02   -8.80279e+02    2.64599e-06
>
> DD  step 50001999  vol min/aver 0.905  load imb.: force 32.7%  pme
> mesh/force 1.034
>
>        ...
>
>
>
>
> It is obviously shown that the energy informations varied from 100002ps
> (the md is continued from 100ns). Generally speaking, the two continued MD
> should be same each other since the conditions are same.
> Why are they different? Does it mean the MD can not be terminated or
> transfered from one server to another because they are changeable if we
> want to investigate the dynamic property?
> Do anyone knows the problems?
>
>
>
>
> Best regards,
> Ouyang.
>
>
>
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list