[gmx-developers] Following forces with domain decomposition
Justin Lemkul
jalemkul at vt.edu
Wed Jun 10 01:13:18 CEST 2020
On 6/9/20 3:57 PM, Berk Hess wrote:
> Hi,
>
> I think Mark's suggestion is very good. This avoids all special
> communication for Drude oscillators. This only requires adding Drude
> connections to the update group setup.
>
Thanks to you both - I will give this a try. Seems like a better idea
and a very nice feature.
> Is what you are printing based on the global atom number? If not that
> explains a lot.
>
The atom numbers are global. I've been tracking this very carefully for
a long time.
-Justin
> Cheers,
>
> Berk
>
> On 2020-06-09 20:26, Mark Abraham wrote:
>> Hi,
>>
>> Perhaps not the solution you're looking for, but since 2019, DD has
>> been based on the notion of a domain being a compact collection of
>> update groups (which are indivisible units like -CH2-) rather than a
>> strict geometric criterion. That was done so that h-bond only
>> constraints need not communicate, but is probably also a good choice
>> for Drude+parent. You should still be able to validate the
>> single-domain cases with your old code based on a long-ago version.
>>
>> Mark
>>
>> On Tue, 9 Jun 2020 at 19:14, Justin Lemkul <jalemkul at vt.edu
>> <mailto:jalemkul at vt.edu>> wrote:
>>
>> ...
>> Hi All,
>>
>> I'm trying (once again) to get back into figuring out the
>> lingering bugs
>> with the Drude implementation when using domain decomposition.
>> Since I
>> last asked for help, I have gotten coordinate and velocity
>> communication
>> working properly. Now, I'm stuck on forces. To quickly recap the
>> issue,
>> it is possible that Drudes and their parent atoms get separated in
>> different domains. This requires communication of coordinates,
>> velocities, and forces via treatment as "special atoms" like is
>> the case
>> with virtual sites. As such, my implementation largely follows what
>> happens for the virtual sites (communicate after any update).
>>
>> I have been tracing the forces at every step of do_force - basically
>> printing out the force on a Drude that I know is in a different
>> domain
>> from its parent atom. I use the OpenMP output as reference. I can
>> reproduce the OpenMP forces with domain decomposition but no
>> communication (e.g. gmx mdrun -ntmpi 2 -npme 1 -deffnm md -nb cpu),
>> based on Berk's suggestion from a long time ago. So the issue I'm
>> having
>> must be coming from communicating somewhere, but I can't nail it
>> down.
>> Here is an example of the output I'm looking at.
>>
>> First, from OpenMP (my reference, the correct output):
>>
>> === Step 0 ===
>> DO FORCE: top f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 1271.383667
>> -3106.622803 2148.540283
>> DO FORCE: after nbnxn_atomdata_add_nbat_fshift_to_fshift f[54] =
>> 1271.383667 -3106.622803 2148.540283
>> DO FORCE: after do_force_lowlevel f[54] = 82.651733 130.833740
>> 82.218506
>> DO FORCE: b4 move_f f[54] = 82.651733 130.833740 82.218506
>> DO FORCE: after move_f f[54] = 82.651733 130.833740 82.218506
>> DO FORCE: after GPU use/emulate f[54] = 82.651733 130.833740
>> 82.218506
>> DO FORCE: after vsite_spread f[54] = 82.651733 130.833740 82.218506
>> DO FORCE: b4 post f[54] = 82.651733 130.833740 82.218506
>> DO FORCE: end f[54] = 58.264297 16.147758 43.956337
>> === Step 1 ===
>> DO FORCE: top f[54] = 58.264297 16.147758 43.956337
>> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 1205.647705
>> -3128.451904 2138.944580
>> DO FORCE: after nbnxn_atomdata_add_nbat_fshift_to_fshift f[54] =
>> 1205.647705 -3128.451904 2138.944580
>> DO FORCE: after do_force_lowlevel f[54] = 200.794189 -175.644287
>> -279.924072
>> DO FORCE: b4 move_f f[54] = 200.794189 -175.644287 -279.924072
>> DO FORCE: after move_f f[54] = 200.794189 -175.644287 -279.924072
>> DO FORCE: after GPU use/emulate f[54] = 200.794189 -175.644287
>> -279.924072
>> DO FORCE: after vsite_spread f[54] = 200.794189 -175.644287
>> -279.924072
>> DO FORCE: b4 post f[54] = 200.794189 -175.644287 -279.924072
>> DO FORCE: end f[54] = 162.370026 -306.717041 -321.102356
>>
>>
>> Now, my implementation with domain decomposition:
>>
>> === Step 0 ===
>> DO FORCE: top f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 338.912842
>> -2940.618164 2357.080078
>> DO FORCE: after do_force_lowlevel f[54] = 1899.546387 -1663.452881
>> 1703.655273
>> DO FORCE: b4 move_f f[54] = 1899.546387 -1663.452881 1703.655273
>> DO FORCE: after move_f f[54] = 82.647949 130.835449 82.213165
>> DO FORCE: after GPU use/emulate f[54] = 82.647949 130.835449
>> 82.213165
>> DO FORCE: after vsite_spread f[54] = 82.647949 130.835449 82.213165
>> DO FORCE: b4 post f[54] = 82.647949 130.835449 82.213165
>> DO FORCE: end f[54] = 58.260483 16.149330 43.951458
>> === Step 1 ===
>> DO FORCE: top f[54] = 58.260483 16.149330 43.951458
>> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 265.444092
>> -2965.024170 2346.120117
>> DO FORCE: after do_force_lowlevel f[54] = 1834.273926 -1685.225830
>> 1654.119141
>> DO FORCE: b4 move_f f[54] = 1834.273926 -1685.225830 1654.119141
>> DO FORCE: after move_f f[54] = 258.300781 -122.286865 -219.277039
>> DO FORCE: after GPU use/emulate f[54] = 258.300781 -122.286865
>> -219.277039
>> DO FORCE: after vsite_spread f[54] = 258.300781 -122.286865
>> -219.277039
>> DO FORCE: b4 post f[54] = 258.300781 -122.286865 -219.277039
>> DO FORCE: end f[54] = 229.446487 -248.274734 -255.144485
>>
>> From this output, I can see that communication works in step 0 and
>> between steps 0 and 1, since the force is correctly propagated. I
>> also
>> do not know to what extent I can expect forces to match before the
>> "move_f" step (which is where I communicate non-local Drude
>> forces and
>> follows the existing "dd_move_f" in do_force_cutsVERLET). But the
>> forces
>> should certainly be the same after communicating so they are
>> correctly
>> input to post_process_forces.
>>
>> Can anyone suggest how the code paths might differ between these two
>> steps? I've debugged every step along the way that I can figure
>> out and
>> all I can come up with is that the forces end up different. I
>> know that
>> may be a big request without seeing the code, but I'm simply
>> determining
>> non-local Drudes the same way we do with vsites, and
>> communicating their
>> forces with the existing dd_move_f_specat function that vsites
>> also use.
>>
>> Any help would be greatly appreciated. I've been stuck on this
>> forever
>> and it is clear that our user community really wants this
>> feature. I can
>> give them OpenMP easily, but that's rather restrictive...
>>
>> -Justin
>>
>> --
>> ==================================================
>>
>> Justin A. Lemkul, Ph.D.
>> Assistant Professor
>> Office: 301 Fralin Hall
>> Lab: 303 Engel Hall
>>
>> Virginia Tech Department of Biochemistry
>> 340 West Campus Dr.
>> Blacksburg, VA 24061
>>
>> jalemkul at vt.edu <mailto:jalemkul at vt.edu> | (540) 231-3129
>> http://www.thelemkullab.com
>>
>> ==================================================
>>
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org
>> <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>
>
--
==================================================
Justin A. Lemkul, Ph.D.
Assistant Professor
Office: 301 Fralin Hall
Lab: 303 Engel Hall
Virginia Tech Department of Biochemistry
340 West Campus Dr.
Blacksburg, VA 24061
jalemkul at vt.edu | (540) 231-3129
http://www.thelemkullab.com
==================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20200609/a72a4340/attachment.html>
More information about the gromacs.org_gmx-developers
mailing list