[gmx-developers] 4.6 Binaries and Acceleration levels

Fri Jul 6 09:55:02 CEST 2012

On Fri, Jul 6, 2012 at 3:16 AM, Berk Hess <hess at kth.se> wrote:
> On 07/06/2012 05:51 AM, Nicholas Breen wrote:
>> On Thu, Jul 05, 2012 at 11:05:53PM +0200, Szilárd Páll wrote:
>>> On Thu, Jul 5, 2012 at 10:59 PM, Christoph Junghans<junghans at votca.org>  wrote:
>>>> Roland has a good point, but Debian and Fedora already compile Gromacs
>>>> for different mpi version.
>>>>
>>>> 4 acc. x 3 (serial,openmpi&  mpich2) = 12 packages!
>>>>
>>>> @ Jussi: Would that still be possible?
>>> I guess possible is one thing and probable is a totally different one...
>>>
>>> Perhaps it would be good to ask the Ubuntu/Debian maintainer as well...
>> I would not want to create that much package complexity (the gromacs source
>> package already builds five binary packages!), especially if it would all be on
>> only one of the many architectures supported, and I cannot think of any other
>> package in the Debian archive that operates that way -- everything that I know
>> of with multiple CPU optimizations uses run-time detection, except for the
>> unavoidable cases with packaging the Linux kernel itself.  If it is
>> functionality that is contained purely within the shared libraries, then
>> glibc's hwcap support might be a workaround if the build system can permit
>> compiling one copy of the libraries for every supported variant.  Otherwise, if
>> it's all within mdrun itself, maybe just stick to run-time detection and
>> downgrade or eliminate the warnings it issues for "suboptimal" CPUs?
>>
>>
> Erik's non-bonded kernels are present as source in all flavors.
> My non-bonded kernel sources can easily be put in in all flavors.
> Then we would still need compilation and call selection support.
> But there is also x86 SIMD code in the PME and bonded code,
> as well as in some other parts of the code.
How big is the maximum speedup we get in anythings but non-bonded
(PME, bonded, ...) of AVX/SSE4 over SSE2?
If it isn't very dramatic compared to the seepdup of just the
non-bonded, I think their would be a great value in having
multi-acceleration non-bonded kernels. Even if that means that all
other functions are limited to the lowest common denominator targeted
by the binary. Because it would at least prevent people to use the
SSE2 non-bonded kernels when using binaries. But if the speedup of
bonded/PME is also very significant, than it is probably better to
advice people not to use binaries for mdrun but only for analysis
tools and always compile mdrun.

> Unless we have a
> high level way of dealing with this, things with get messy.
> Also the compiler can add e.g. AVX instructions in plain C code.
> So if it can be handled by compiling different shared libraries,
> that would be far simpler. All compute intensive functionality
> is in the shared libraries, so that's no issue.
I don't see any easy solution for that. Certainly not something which
can be done in a week or two. C++ templates might give us a tool in
the long run.

Thus I suggest depending on the speedup of AVX of bonded/PME/compiler
based SIMD/other:
- If large speedup: Document clearly that mdrun should be compiled
manually and should not be used from Linux distributions to obtain
decent performance.
- If small speedup relative to nonbonded only architecture selection:
Have support for multi-SIMD kernels and have a small warning that for
the best performance one should compile from source.

Roland

>
> Cheers,
>
> Berk
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
>


-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309