[gmx-developers] Making libgromacs a dynamic loadable object
Mark Abraham
mark.j.abraham at gmail.com
Mon Sep 30 16:02:37 CEST 2013
On Mon, Sep 30, 2013 at 11:23 AM, Erik Lindahl
<erik.lindahl at scilifelab.se>wrote:
> Hi,
>
> On Sep 30, 2013, at 11:14 AM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
>
> > I think this is unlikely to make it for 5.0, but long-term I would like
> to support multiple hardware accelerations in a single binary again, by
> making the actual binaries very small and loading one of several libraries
> as a dynamic module at runtime. This is not technically difficult to do,
> but there is one step that will be a little pain for us: Each symbol we
> want to use from the library must be resolved manually with a call to
> dlsym().
>
> I can see three possible division levels: mdrun vs tools, md-loop vs rest,
> hardware-tuned inner loops vs rest. The third is by far the easiest to do.
>
> We already discussed this in Redmine (see the thread Teemu linked to), and
> unfortunately the problem is not limited to inner loops - CPU-specific
> optimization flags has significant impact on large parts of the code, and
> will improve performance by ~20% above the inner kernels - I don't think
> we're willing to sacrifice 20% performance.
>
OK, great. I'm glad someone has measured some numbers (even if those
reported in http://redmine.gromacs.org/issues/1165 are 9% and 17%). If
premature optimization is the root of all evil, then optimization based on
assumption is the tuber of all evil!
Cray still requires static linking, and BlueGene/Q encourages it, so I
> think it is important that the implementation does not require dynamic
> linking in the cases where portability of the binary is immaterial.
>
> I don't think we both can have our cake and eat it. For special-purpose
> highly parallel architectures that require static linking I think it is
> reasonable that the Gromacs binary will be specific to that particular
> architecture.
>
Agreed.
> > This means we should start thinking of two things to make life simpler
> in the future:
> >
> > 1) Decide on what level we want the interface between library and
> executables, and keep this interface _really_ small (in the sense that we
> want to resolve as few symbols as possible).
> > 2) Since we will have to compile the high-level binaries with generic
> compiler flags, any code that is performance-sensitive should go in the
> architecture-specific optimized library.
>
> I think the third option I give above is the most achievable. I do not
> know whether the dynamic function calls incur overhead per call, or whether
> that can be mitigated by the helper object Teemu suggested, but he sounds
> right (as usual). I hope the libraries would share the same address space.
> Since we anyway plan for tasks to wrap function calls, the implementations
> converge.
>
> See above. It would lose ~20% performance, which I think is unacceptable.
> The main md loop and all functions under it need to be compiled with
> CPU-specific optimization, so that's the lowest level we can split on.
> Otherwise we can just as well disable AVX optimization and ship SSE4.1
> binaries to be portable :-)
>
OK, so if the division is based on "code called from the integrator loop,"
then that starts to make for a sensible division of the code base. The
general API (i.e. which might need to be implemented with dlsym() on x86)
would be the integrator functions, unless/until someone identifies specific
needs. Organizational sanity suggests that we start moving code from
src/gromacs to src/core when someone is fairly sure it belongs there. The
criterion for that should be along the lines that "it has been measured to
benefit from machine-specific compiler optimization flags, and is not
called directly by code in src/gromacs." So for starters that is kernels,
neighbour search code, update code, PME.
What about data structures (like parts of forcerec) that get filled during
setup? If the compiler might want to control alignment and padding to suit
the hardware, then they will have to be declared in src/core, and their
setting machinery must be in the API.
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20130930/269712b9/attachment.html>
More information about the gromacs.org_gmx-developers
mailing list