[gmx-developers] force calculations, especially in do_fnbf()
Erik Lindahl
lindahl at sbc.su.se
Fri May 12 10:40:03 CEST 2006
Hi Dave,
You shouldn't have to mess with the assembly, but you might have to
tweak the #defines used during build.
First recommendation: upgrade to 3.3 or later where the calls are
much easier to understand!
Here's what happens - forget the assembly loops for now:
To optimize performance, it is really important to avoid doing
unnecessary computations. For instance, if a particle doesn't have
charge we shouldn't do any coulomb interactions for it. This might
seem excessive, but can easily save us a factor 2 in performance.
A lot of codes do this with conditionals inside the innermost loop,
which is hopeless for modern pipelined CPUs.
In Gromacs, we have instead moved the logic to the neighborlist
generation, and keep separate neighborlists for coulomb-only,
coulomb&lj, water-water interactions, etc.
For each of these combinations we then have a tailor-made nonbonded
routine that only calculates _exactly_ what we need - not a single
extra floating-point operation, and no conditionals whatsoever.
After building gromacs, these routines are present in src/gmxlib/
nonbonded/nb_kernel.
In practice, there is very little difference between the different
interactions, so to avoid bugs and enable optimizations like software-
based sqrt(), we generate them with a small program during the build.
This way it is also easy for us to do it in Fortran77 (which is still
faster on some systems), and we can do either single or double
precision.
For a closer look, you can cd to that directory and generate kernels
with comments this way:
./mknb -comments
or why not double precision kernels in fortran:
./mknb -double -fortran
(-h lists available options).
OK, back to the calling:
A long, long time ago in a galaxy far away, we had a bauta switch()
that called the correct loop, with different arguments for each loop.
At one point we even used dirty m4 macros to create them :-)
In more recent versions of Gromacs, we have changed all nonbonded
kernels to use the same call sequence and arguments. The first time
we call do_nonbonded(), we assign function pointers to each
neighborlist based on the interactions it should calculate (which is
still represented as a stupid enumerated index).
Let's revive the assembly loops now. Since the kernel is represented
with a function pointer, it is trivial to point to a hardware-
optimized routine for an individual kernel.
In recent Gromacs versions it is even possible to use e.g. assembly
code for a handful of really important kernels, but fallback to the C/
Fortran versions for the rest. This is done by first running the
"vanilla" nb_kernel_setup(), and then we call the setup routines for
accelerated kernels that overwrite function pointers only for the
choices where they are available.
This, it isn't quite as dirty as it might seem - the nonbonded force
part is almost entirely isolated from the rest.
Unless you do free energy, the do_nonbonded() routine only updates
* Forces in f[]
* Shift forces (read the manual for expl.) in fshift[]
* Coulomb energies for groups in egcoul[]
* LJ energies for groups in egnb[]
Cheers,
Erik
On May 12, 2006, at 1:47 AM, D. Ensign wrote:
> Greetings,
>
> I've been spending long days in the GROMACS code mines, swinging my
> pick axe, whistling a
> working song (which annoys my coworkers), and tyring to do
> constraints and integration in
> double precision while doing the forces in single precision. Four
> canaries have died from
> the poisonous gases. I'm pleased to report that I haven't
> noticeably broken anything.
> Yet. (And I may have made MRS's RATTLE implementation work a tiny
> bit better in "single"
> precision.)
>
> I've seen, deep down in the bowels of do_fnbf(), the calls to
> assembler code, which I
> don't understand (yet), but which look importantly like something
> different happens when
> I ./configure --disable-float instead of ./configure the regular way.
>
> If I want to do the forces in single precision, then everything
> that gets to do_fnbf(),
> and everything advertised in the ARGS, and is subsequently sent to
> the assembler code
> ought to be in obligate single precision. That is, if I'm working
> as if the "forces are
> in single" then the appropriate assembler functions ought to be
> used. Of course it's
> possible to copy over ordinary precision (implied double) arrays
> into obligate single
> precision arrays. But is it necessary to copy them back?
>
> Notice that I'm taking the strategy of, don't 8888 with the
> assembly code.
>
> The question boils down to, do these mysterious functions called by
> do_fnbf() modify
> anything other than f[]? For everything that they modify, I either
> have to make sure it's
> obligate single precision throughout the code, or copy it back to
> an obligate double form
> after do_fnbf() is done messing with it.
>
> Any insight, answers, wisdom, or ridicule would be most appreciated.
>
> Dan Ensign
> Stanford University
> --
> "Give up learning, and put an end to your troubles." -- Lao Tzu
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
More information about the gromacs.org_gmx-developers
mailing list