[gmx-developers] Following forces with domain decomposition

Mark Abraham mark.j.abraham at gmail.com
Tue Jun 9 20:26:26 CEST 2020


Hi,

Perhaps not the solution you're looking for, but since 2019, DD has been
based on the notion of a domain being a compact collection of update groups
(which are indivisible units like -CH2-) rather than a strict geometric
criterion. That was done so that h-bond only constraints need not
communicate, but is probably also a good choice for Drude+parent. You
should still be able to validate the single-domain cases with your old code
based on a long-ago version.

Mark

On Tue, 9 Jun 2020 at 19:14, Justin Lemkul <jalemkul at vt.edu> wrote:

>
> Hi All,
>
> I'm trying (once again) to get back into figuring out the lingering bugs
> with the Drude implementation when using domain decomposition. Since I
> last asked for help, I have gotten coordinate and velocity communication
> working properly. Now, I'm stuck on forces. To quickly recap the issue,
> it is possible that Drudes and their parent atoms get separated in
> different domains. This requires communication of coordinates,
> velocities, and forces via treatment as "special atoms" like is the case
> with virtual sites. As such, my implementation largely follows what
> happens for the virtual sites (communicate after any update).
>
> I have been tracing the forces at every step of do_force - basically
> printing out the force on a Drude that I know is in a different domain
> from its parent atom. I use the OpenMP output as reference. I can
> reproduce the OpenMP forces with domain decomposition but no
> communication (e.g. gmx mdrun -ntmpi 2 -npme 1 -deffnm md -nb cpu),
> based on Berk's suggestion from a long time ago. So the issue I'm having
> must be coming from communicating somewhere, but I can't nail it down.
> Here is an example of the output I'm looking at.
>
> First, from OpenMP (my reference, the correct output):
>
> === Step 0 ===
> DO FORCE: top f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 1271.383667
> -3106.622803 2148.540283
> DO FORCE: after nbnxn_atomdata_add_nbat_fshift_to_fshift f[54] =
> 1271.383667 -3106.622803 2148.540283
> DO FORCE: after do_force_lowlevel f[54] = 82.651733 130.833740 82.218506
> DO FORCE: b4 move_f f[54] = 82.651733 130.833740 82.218506
> DO FORCE: after move_f f[54] = 82.651733 130.833740 82.218506
> DO FORCE: after GPU use/emulate f[54] = 82.651733 130.833740 82.218506
> DO FORCE: after vsite_spread f[54] = 82.651733 130.833740 82.218506
> DO FORCE: b4 post f[54] = 82.651733 130.833740 82.218506
> DO FORCE: end f[54] = 58.264297 16.147758 43.956337
> === Step 1 ===
> DO FORCE: top f[54] = 58.264297 16.147758 43.956337
> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 1205.647705
> -3128.451904 2138.944580
> DO FORCE: after nbnxn_atomdata_add_nbat_fshift_to_fshift f[54] =
> 1205.647705 -3128.451904 2138.944580
> DO FORCE: after do_force_lowlevel f[54] = 200.794189 -175.644287
> -279.924072
> DO FORCE: b4 move_f f[54] = 200.794189 -175.644287 -279.924072
> DO FORCE: after move_f f[54] = 200.794189 -175.644287 -279.924072
> DO FORCE: after GPU use/emulate f[54] = 200.794189 -175.644287 -279.924072
> DO FORCE: after vsite_spread f[54] = 200.794189 -175.644287 -279.924072
> DO FORCE: b4 post f[54] = 200.794189 -175.644287 -279.924072
> DO FORCE: end f[54] = 162.370026 -306.717041 -321.102356
>
>
> Now, my implementation with domain decomposition:
>
> === Step 0 ===
> DO FORCE: top f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 338.912842
> -2940.618164 2357.080078
> DO FORCE: after do_force_lowlevel f[54] = 1899.546387 -1663.452881
> 1703.655273
> DO FORCE: b4 move_f f[54] = 1899.546387 -1663.452881 1703.655273
> DO FORCE: after move_f f[54] = 82.647949 130.835449 82.213165
> DO FORCE: after GPU use/emulate f[54] = 82.647949 130.835449 82.213165
> DO FORCE: after vsite_spread f[54] = 82.647949 130.835449 82.213165
> DO FORCE: b4 post f[54] = 82.647949 130.835449 82.213165
> DO FORCE: end f[54] = 58.260483 16.149330 43.951458
> === Step 1 ===
> DO FORCE: top f[54] = 58.260483 16.149330 43.951458
> DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
> DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 265.444092
> -2965.024170 2346.120117
> DO FORCE: after do_force_lowlevel f[54] = 1834.273926 -1685.225830
> 1654.119141
> DO FORCE: b4 move_f f[54] = 1834.273926 -1685.225830 1654.119141
> DO FORCE: after move_f f[54] = 258.300781 -122.286865 -219.277039
> DO FORCE: after GPU use/emulate f[54] = 258.300781 -122.286865 -219.277039
> DO FORCE: after vsite_spread f[54] = 258.300781 -122.286865 -219.277039
> DO FORCE: b4 post f[54] = 258.300781 -122.286865 -219.277039
> DO FORCE: end f[54] = 229.446487 -248.274734 -255.144485
>
>  From this output, I can see that communication works in step 0 and
> between steps 0 and 1, since the force is correctly propagated. I also
> do not know to what extent I can expect forces to match before the
> "move_f" step (which is where I communicate non-local Drude forces and
> follows the existing "dd_move_f" in do_force_cutsVERLET). But the forces
> should certainly be the same after communicating so they are correctly
> input to post_process_forces.
>
> Can anyone suggest how the code paths might differ between these two
> steps? I've debugged every step along the way that I can figure out and
> all I can come up with is that the forces end up different. I know that
> may be a big request without seeing the code, but I'm simply determining
> non-local Drudes the same way we do with vsites, and communicating their
> forces with the existing dd_move_f_specat function that vsites also use.
>
> Any help would be greatly appreciated. I've been stuck on this forever
> and it is clear that our user community really wants this feature. I can
> give them OpenMP easily, but that's rather restrictive...
>
> -Justin
>
> --
> ==================================================
>
> Justin A. Lemkul, Ph.D.
> Assistant Professor
> Office: 301 Fralin Hall
> Lab: 303 Engel Hall
>
> Virginia Tech Department of Biochemistry
> 340 West Campus Dr.
> Blacksburg, VA 24061
>
> jalemkul at vt.edu | (540) 231-3129
> http://www.thelemkullab.com
>
> ==================================================
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20200609/b7969cac/attachment.html>


More information about the gromacs.org_gmx-developers mailing list