[gmx-developers] Slow PME with git master and gcc 4.3.3 & 4.3.4
roland at utk.edu
Sun May 16 02:21:26 CEST 2010
I think I found out what's going on.
It is a combination of gcc and libc. If expf is used, the time is wasted in
feholdexcept, fesetenv, fesetround.
Because libc doesn't has a x86-64 optimized version for expf. See:
It only shows up in 4.2.x and 4.3.x because other gcc version don't replace
exp with expf.
Thus one could either file a bug against gcc 4.3 (4.2 isn't maintained
anymore) to not do the replacement. But I doubt this is high enough priority
to get in. Thus it makes more sense to file against libc to fix expf. I
don't understand why they didn't include the optimized expf which was
submitted (see the links in the above linked message).
To make a work-around in GROMACS I suggest to submit the patch from the
earlier (the temp double variable). Probably best with #ifdef to have the
workaround only for 4.2.x and 4.3.x, to make sure it doesn't affect any
other compiler negatively. Should I commit that?
BTW: One can save 5% in solve PME by -ffast-math (gcc 4.4.2). Is it save to
use -ffast-math when compiling pme.c? If so should we make this default in
On Sat, May 15, 2010 at 5:40 PM, Roland Schulz <roland at utk.edu> wrote:
> Two more things:
> I can't reproduce Berk's measurement that 4.3.2 is fine. PME solve is as
> slow for me compiled with 4.3.2 then with 4.3.4 (both 8.9s instead of 2.9s
> for 4.4.2 for 20^3 grid on 8 threads). And this agrees with the bug report I
> linked that the replacement is 4.2.x and 4.3.x are doing this replacement.
> I'm running this on Xeon X5550.
> I can't figure out so far whether the difference is in the function call
> overhead or in the exp/expf function itself. The data is in (4 packed
> floats) xmm registers and thus the call to exp/expf require
> a cvtpd2ps/movss. That the expf is fast in a test program suggests it is the
> data conversion. The profiling with hpctoolkit suggests it is the expf
> function itself. I'm not sure whether my test program is not capturing the
> problem correctly or whether I misread the hpctoolkit output. Or whether it
> is some interplay of both (e.g. save/restore of registers).
> If someone could run some profiling to verify my hpctoolkit results to test
> what is the reason, this would help. As far as I know it has to be a
> sampling based profiling: E.g. Apple Shark but not e.g. gprof/tau.
> On Sat, May 15, 2010 at 4:30 AM, Alexey Shvetsov <alexxyum at gmail.com>wrote:
>> Hi all!
>> I made some tests with current git master and gcc-4.4.3 and gcc-4.5.0
>> Seems gromacs compiled with gcc-4.5.0 is a little bit faster then
>> gcc-4.4.3 (~2-5%)
>> I can do this tests with profiling if needed
>> 2010/5/15 Roland Schulz <roland at utk.edu>
>>> it is caused by gcc using expf instead of exp.
>>> By changing pme.c in the two places where exp is called this way:
>>> - tmp1[kx] = exp(tmp1[kx]);
>>> + double t = tmp1[kx];
>>> + tmp1[kx] = exp(t);
>>> gcc 4.3.4 is as fast as gcc 4.4.2.
>>> See also http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35202
>>> (The bug is not related to performance but it discusses the replacement
>>> from exp->expf)
>>> But what I don't understand at all is why expf is so much slower (expf
>>> 7-8x as slow in pme.c than exp (*)). A small test program which computes exp
>>> on an array, compiled with gcc 4.3.4 is actually faster than with 4.4.2
>>> (30%). Thus for the test-program the expf seams to be faster. Why expf is so
>>> slow in pme.c is odd.
>>> *) Got that measurement form HPCToolkit. Shows performance line-by-line.
>>> Thus it was easy to pin-point the cause.
>>> On Fri, May 14, 2010 at 10:43 AM, Berk Hess <hess at cbr.su.se> wrote:
>>>> The PME code in git master is a lot faster than the current 4.0.7
>>>> release code.
>>>> However, there seems to be a bug in gcc version 4.3.3 and 4.3.4 (4.3.2
>>>> and 4.4.1 are fine)
>>>> that make the pme_solve part 5 times (!) slower than with proper
>>>> compiler versions.
>>>> This is between 10% and 20% of the total mdrun performance.
>>>> We are currently trying to figure out what triggers this issue and we
>>>> are sending
>>>> a bug report to the gcc mailing list.
>>>> I someone has a hint, please reply.
>>>> gmx-developers mailing list
>>>> gmx-developers at gromacs.org
>>>> Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to gmx-developers-request at gromacs.org.
>>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>>> 865-241-1537, ORNL PO BOX 2008 MS6309
>>> gmx-developers mailing list
>>> gmx-developers at gromacs.org
>>> Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-developers-request at gromacs.org.
>> Best Regards,
>> Alexey 'Alexxy' Shvetsov
>> Petersburg Nuclear Physics Institute, Russia
>> Department of Molecular and Radiation Biophysics
>> Gentoo Team Ru
>> Gentoo Linux Dev
>> mailto:alexxyum at gmail.com
>> mailto:alexxy at gentoo.org
>> mailto:alexxy at omrb.pnpi.spb.ru
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers