[gmx-developers] Following forces with domain decomposition

Justin Lemkul jalemkul at vt.edu
Wed Jun 10 01:13:18 CEST 2020



On 6/9/20 3:57 PM, Berk Hess wrote:
> Hi,
>
> I think Mark's suggestion is very good. This avoids all special 
> communication for Drude oscillators. This only requires adding Drude 
> connections to the update group setup.
>

Thanks to you both - I will give this a try. Seems like a better idea 
and a very nice feature.

> Is what you are printing based on the global atom number? If not that 
> explains a lot.
>

The atom numbers are global. I've been tracking this very carefully for 
a long time.

-Justin

> Cheers,
>
> Berk
>
> On 2020-06-09 20:26, Mark Abraham wrote:
>> Hi,
>>
>> Perhaps not the solution you're looking for, but since 2019, DD has 
>> been based on the notion of a domain being a compact collection of 
>> update groups (which are indivisible units like -CH2-) rather than a 
>> strict geometric criterion. That was done so that h-bond only 
>> constraints need not communicate, but is probably also a good choice 
>> for Drude+parent. You should still be able to validate the 
>> single-domain cases with your old code based on a long-ago version.
>>
>> Mark
>>
>> On Tue, 9 Jun 2020 at 19:14, Justin Lemkul <jalemkul at vt.edu 
>> <mailto:jalemkul at vt.edu>> wrote:
>>
>>     ...
>>     Hi All,
>>
>>     I'm trying (once again) to get back into figuring out the
>>     lingering bugs
>>     with the Drude implementation when using domain decomposition.
>>     Since I
>>     last asked for help, I have gotten coordinate and velocity
>>     communication
>>     working properly. Now, I'm stuck on forces. To quickly recap the
>>     issue,
>>     it is possible that Drudes and their parent atoms get separated in
>>     different domains. This requires communication of coordinates,
>>     velocities, and forces via treatment as "special atoms" like is
>>     the case
>>     with virtual sites. As such, my implementation largely follows what
>>     happens for the virtual sites (communicate after any update).
>>
>>     I have been tracing the forces at every step of do_force - basically
>>     printing out the force on a Drude that I know is in a different
>>     domain
>>     from its parent atom. I use the OpenMP output as reference. I can
>>     reproduce the OpenMP forces with domain decomposition but no
>>     communication (e.g. gmx mdrun -ntmpi 2 -npme 1 -deffnm md -nb cpu),
>>     based on Berk's suggestion from a long time ago. So the issue I'm
>>     having
>>     must be coming from communicating somewhere, but I can't nail it
>>     down.
>>     Here is an example of the output I'm looking at.
>>
>>     First, from OpenMP (my reference, the correct output):
>>
>>     === Step 0 ===
>>     DO FORCE: top f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 1271.383667
>>     -3106.622803 2148.540283
>>     DO FORCE: after nbnxn_atomdata_add_nbat_fshift_to_fshift f[54] =
>>     1271.383667 -3106.622803 2148.540283
>>     DO FORCE: after do_force_lowlevel f[54] = 82.651733 130.833740
>>     82.218506
>>     DO FORCE: b4 move_f f[54] = 82.651733 130.833740 82.218506
>>     DO FORCE: after move_f f[54] = 82.651733 130.833740 82.218506
>>     DO FORCE: after GPU use/emulate f[54] = 82.651733 130.833740
>>     82.218506
>>     DO FORCE: after vsite_spread f[54] = 82.651733 130.833740 82.218506
>>     DO FORCE: b4 post f[54] = 82.651733 130.833740 82.218506
>>     DO FORCE: end f[54] = 58.264297 16.147758 43.956337
>>     === Step 1 ===
>>     DO FORCE: top f[54] = 58.264297 16.147758 43.956337
>>     DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 1205.647705
>>     -3128.451904 2138.944580
>>     DO FORCE: after nbnxn_atomdata_add_nbat_fshift_to_fshift f[54] =
>>     1205.647705 -3128.451904 2138.944580
>>     DO FORCE: after do_force_lowlevel f[54] = 200.794189 -175.644287
>>     -279.924072
>>     DO FORCE: b4 move_f f[54] = 200.794189 -175.644287 -279.924072
>>     DO FORCE: after move_f f[54] = 200.794189 -175.644287 -279.924072
>>     DO FORCE: after GPU use/emulate f[54] = 200.794189 -175.644287
>>     -279.924072
>>     DO FORCE: after vsite_spread f[54] = 200.794189 -175.644287
>>     -279.924072
>>     DO FORCE: b4 post f[54] = 200.794189 -175.644287 -279.924072
>>     DO FORCE: end f[54] = 162.370026 -306.717041 -321.102356
>>
>>
>>     Now, my implementation with domain decomposition:
>>
>>     === Step 0 ===
>>     DO FORCE: top f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 338.912842
>>     -2940.618164 2357.080078
>>     DO FORCE: after do_force_lowlevel f[54] = 1899.546387 -1663.452881
>>     1703.655273
>>     DO FORCE: b4 move_f f[54] = 1899.546387 -1663.452881 1703.655273
>>     DO FORCE: after move_f f[54] = 82.647949 130.835449 82.213165
>>     DO FORCE: after GPU use/emulate f[54] = 82.647949 130.835449
>>     82.213165
>>     DO FORCE: after vsite_spread f[54] = 82.647949 130.835449 82.213165
>>     DO FORCE: b4 post f[54] = 82.647949 130.835449 82.213165
>>     DO FORCE: end f[54] = 58.260483 16.149330 43.951458
>>     === Step 1 ===
>>     DO FORCE: top f[54] = 58.260483 16.149330 43.951458
>>     DO FORCE: after do_nb_verlet #1 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after do_nb_verlet #2 f[54] = 0.000000 0.000000 0.000000
>>     DO FORCE: after nbnxn_atomdata_add_nbat_f_to_f f[54] = 265.444092
>>     -2965.024170 2346.120117
>>     DO FORCE: after do_force_lowlevel f[54] = 1834.273926 -1685.225830
>>     1654.119141
>>     DO FORCE: b4 move_f f[54] = 1834.273926 -1685.225830 1654.119141
>>     DO FORCE: after move_f f[54] = 258.300781 -122.286865 -219.277039
>>     DO FORCE: after GPU use/emulate f[54] = 258.300781 -122.286865
>>     -219.277039
>>     DO FORCE: after vsite_spread f[54] = 258.300781 -122.286865
>>     -219.277039
>>     DO FORCE: b4 post f[54] = 258.300781 -122.286865 -219.277039
>>     DO FORCE: end f[54] = 229.446487 -248.274734 -255.144485
>>
>>      From this output, I can see that communication works in step 0 and
>>     between steps 0 and 1, since the force is correctly propagated. I
>>     also
>>     do not know to what extent I can expect forces to match before the
>>     "move_f" step (which is where I communicate non-local Drude
>>     forces and
>>     follows the existing "dd_move_f" in do_force_cutsVERLET). But the
>>     forces
>>     should certainly be the same after communicating so they are
>>     correctly
>>     input to post_process_forces.
>>
>>     Can anyone suggest how the code paths might differ between these two
>>     steps? I've debugged every step along the way that I can figure
>>     out and
>>     all I can come up with is that the forces end up different. I
>>     know that
>>     may be a big request without seeing the code, but I'm simply
>>     determining
>>     non-local Drudes the same way we do with vsites, and
>>     communicating their
>>     forces with the existing dd_move_f_specat function that vsites
>>     also use.
>>
>>     Any help would be greatly appreciated. I've been stuck on this
>>     forever
>>     and it is clear that our user community really wants this
>>     feature. I can
>>     give them OpenMP easily, but that's rather restrictive...
>>
>>     -Justin
>>
>>     -- 
>>     ==================================================
>>
>>     Justin A. Lemkul, Ph.D.
>>     Assistant Professor
>>     Office: 301 Fralin Hall
>>     Lab: 303 Engel Hall
>>
>>     Virginia Tech Department of Biochemistry
>>     340 West Campus Dr.
>>     Blacksburg, VA 24061
>>
>>     jalemkul at vt.edu <mailto:jalemkul at vt.edu> | (540) 231-3129
>>     http://www.thelemkullab.com
>>
>>     ==================================================
>>
>>     -- 
>>     Gromacs Developers mailing list
>>
>>     * Please search the archive at
>>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>     before posting!
>>
>>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>>     * For (un)subscribe requests visit
>>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>     or send a mail to gmx-developers-request at gromacs.org
>>     <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>
>

-- 
==================================================

Justin A. Lemkul, Ph.D.
Assistant Professor
Office: 301 Fralin Hall
Lab: 303 Engel Hall

Virginia Tech Department of Biochemistry
340 West Campus Dr.
Blacksburg, VA 24061

jalemkul at vt.edu | (540) 231-3129
http://www.thelemkullab.com

==================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20200609/a72a4340/attachment.html>


More information about the gromacs.org_gmx-developers mailing list