[gmx-users] Problem on continuing MD
YanhuaOuyang
15901283893 at 163.com
Wed Nov 1 13:39:45 CET 2017
Dear Mark,
Thank you so much. I have read the linked website you told and know why such problem happens.
Best regards,
Ouyang
At 2017-11-01 17:39:17, "Mark Abraham" <mark.j.abraham at gmail.com> wrote:
>Hi,
>
>See http://www.gromacs.org/Documentation/Terminology/Reproducibility. You
>have non-reproducible load balancing. However this is not a problem unless
>your experimental design hinges upon being able to reproduce an exact
>trajectory (in which case you will have a tough time getting performance).
>You would get a different trajectory if you rotated your initial system by
>90 degrees too. Which one is "right?"
>
>Mark
>
>On Wed, Nov 1, 2017 at 10:20 AM YanhuaOuyang <15901283893 at 163.com> wrote:
>
>> Dear gromacs user,
>>
>>
>> Today, I continue the MD twice in two directories from the same
>> point of the MD trajectory, for example 100ns, using the same CPU, same
>> checkpoint file, same serve node. To my surprise, the energy informations
>> are different between the two continued log ouput files, which are shown
>> below.
>>
>>
>> continue_md_01.log:
>>
>> Started mdrun on rank 0 Wed Nov 1 23:17:08 2017
>>
>> Step Time Lambda
>>
>> 50000000 100000.00000 0.00000
>>
>> Energies (kJ/mol)
>>
>> Bond U-B Proper Dih. Improper Dih. CMAP Dih.
>>
>> 4.67507e+02 1.47390e+03 1.44019e+03 6.93280e+01 8.29478e+01
>>
>> LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>>
>> 4.17297e+02 9.82076e+03 1.12930e+05 -8.68735e+03 -1.15522e+06
>>
>> Coul. recip. Potential Kinetic En. Total Energy Conserved En.
>>
>> 5.88613e+03 -1.03132e+06 1.63796e+05 -8.67527e+05 -1.90771e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar) Constr. rmsd
>>
>> 2.82366e+02 -2.04258e+02 -7.74514e+02 2.31539e-06
>>
>> DD step 50000019 load imb.: force 29.5% pme mesh/force 0.964
>>
>> At step 50000020 the performance loss due to force load imbalance is 11.1 %
>>
>> Step Time Lambda
>>
>> 50001000 100002.00000 0.00000
>>
>> Energies (kJ/mol)
>>
>> Bond U-B Proper Dih. Improper Dih. CMAP Dih.
>>
>> 4.65447e+02 1.50124e+03 1.50444e+03 7.92082e+01 1.64421e+01
>>
>> LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>>
>> 4.18616e+02 9.80230e+03 1.11198e+05 -8.68735e+03 -1.15433e+06
>>
>> Coul. recip. Potential Kinetic En. Total Energy Conserved En.
>>
>> 5.95908e+03 -1.03208e+06 1.64017e+05 -8.68059e+05 -1.90761e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar) Constr. rmsd
>>
>> 2.82747e+02 -2.04258e+02 -9.31332e+02 3.06123e-06
>>
>> DD step 50001999 vol min/aver 0.880 load imb.: force 10.0% pme
>> mesh/force 1.059
>>
>> ...
>>
>>
>>
>>
>> continue_md_02.log:
>>
>> Started mdrun on rank 0 Wed Nov 1 23:39:51 2017
>>
>> Step Time Lambda
>>
>> 50000000 100000.00000 0.00000
>>
>> Energies (kJ/mol)
>>
>> Bond U-B Proper Dih. Improper Dih. CMAP Dih.
>>
>> 4.67507e+02 1.47390e+03 1.44019e+03 6.93280e+01 8.29478e+01
>>
>> LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>>
>> 4.17297e+02 9.82076e+03 1.12930e+05 -8.68735e+03 -1.15522e+06
>>
>> Coul. recip. Potential Kinetic En. Total Energy Conserved En.
>>
>> 5.88613e+03 -1.03132e+06 1.63796e+05 -8.67527e+05 -1.90771e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar) Constr. rmsd
>>
>> 2.82366e+02 -2.04258e+02 -7.74505e+02 2.31539e-06
>>
>> DD step 50000019 load imb.: force 18.3% pme mesh/force 0.950
>>
>> At step 50000020 the performance loss due to force load imbalance is 6.9 %
>>
>> Step Time Lambda
>>
>> 50001000 100002.00000 0.00000
>>
>> Energies (kJ/mol)
>>
>> Bond U-B Proper Dih. Improper Dih. CMAP Dih.
>>
>> 4.51321e+02 1.43914e+03 1.56368e+03 9.15439e+01 1.64274e+01
>>
>> LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
>>
>> 4.07426e+02 9.84156e+03 1.12168e+05 -8.68735e+03 -1.15440e+06
>>
>> Coul. recip. Potential Kinetic En. Total Energy Conserved En.
>>
>> 5.88309e+03 -1.03123e+06 1.62769e+05 -8.68456e+05 -1.90745e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar) Constr. rmsd
>>
>> 2.80597e+02 -2.04258e+02 -8.80279e+02 2.64599e-06
>>
>> DD step 50001999 vol min/aver 0.905 load imb.: force 32.7% pme
>> mesh/force 1.034
>>
>> ...
>>
>>
>>
>>
>> It is obviously shown that the energy informations varied from 100002ps
>> (the md is continued from 100ns). Generally speaking, the two continued MD
>> should be same each other since the conditions are same.
>> Why are they different? Does it mean the MD can not be terminated or
>> transfered from one server to another because they are changeable if we
>> want to investigate the dynamic property?
>> Do anyone knows the problems?
>>
>>
>>
>>
>> Best regards,
>> Ouyang.
>>
>>
>>
>>
>>
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>--
>Gromacs Users mailing list
>
>* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
>* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>* For (un)subscribe requests visit
>https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list