[gmx-users] Benchs on gcc and pgi
Jones de Andrade
johannesrs at gmail.com
Wed May 17 08:22:49 CEST 2006
Hi all!
Well, first of all, sorry if it's on the wrong gromacs list, but from what I
could see on the website I could not find a clear indication on where to put
benchmarks.
Anyway, some time ago I asked the list for help on making this benchmarks,
on which I want to compare different compilers. I've been able to compile
and run the benchmarks for GCC (double and single precision) and portland
(single precision). Unfortunatelly,m I could not make it work with Intel
Compiler (yes, I will ask for help again later.. ;) ).
Well, here we go: first, the benchmarks with the CPU usage of about 98%
(varies among the tests) that I've got. After, I put the same benchs, but
with a "rescale" of the performances of each tests for a 100% CPU usage:
*Machine* *CPU/Core* *Compiler* *Clock (MHz)* *Cache (kb)* *Benchmark*
Type N Villin Lys/Cut Lys/PME DPPC Poly-CH2 Average Rate Linux Athlon 1 gcc
800 512 2412 622 456 41 1001 100 1.00 Linux Athlon 64 1 gcc 1800 512 9607
2686
1778
178
4344
410
1.82
Linux Athlon 64 1 gcc + acml
1800 512 9604
2687
1782
178
4336
410
1.82
Linux Athlon 64 1 gcc (dp)
1800 512 5607
1633
1175
117
3420
264
1.17
Linux Athlon 64 1 gcc + acml (dp)
1800 512 5604
1637
1174
118
3423
264
1.17
Linux Athlon 64 1 portland
1800 512 9177
2500
1638
166
3905
384
1.71
Linux Athlon 64 1 portland + acml
1800 512 9181
2499
1639
166
3905
384
1.71
Linux Athlon 64 1 gcc {100%}
1800 512 9823
2730
1844
186
4455
420
1.87
Linux Athlon 64 1 gcc + acml {100%} 1800 512 9820
2762
1815
182
4438
420
1.87
Linux Athlon 64 1 gcc (dp) {100%} 1800 512 5716
1675
1205
120
3519
270
1.20
Linux Athlon 64 1 gcc + acml (dp) {100%} 1800 512 5707
1672
1203
121
3500
269
1.20
Linux Athlon 64 1 portland {100%} 1800 512 9355
2546
1668
171
3989
392
1.74
Linux Athlon 64 1 portland + acml {100%} 1800 512 9368
2545
1671
169
3989
392
1.74
Well, let us see what I could conclude from here: firt, portland is worst
than GCC compilers (not comparable, but worst). That's already bad. But,
even worst, is the fact that the use of the ACML libraries or yeld very poor
extra performance, or just lose the race against the common gcc compilation.
Anyone could tell me if this kind of behavior, of both PGI and acml use as
external blas and lapack, is correct?
Also, is there any extra performance to be gained from the use of Intel
Compilers on this architecture? Does anybody got the following type of error
during compilation (in the 1/sqrt() optimized function) before?
***********************************************************************************
./mknb -software_invsqrt
>>> Gromacs nonbonded kernel generator (-h for help)
>>> Generating single precision functions in C.
>>> Using Gromacs software version of 1/sqrt(x).
make[5]: *** [kernel-stamp] Falha de segmentação
make[5]: Leaving directory
`/home/johannes/src/gromacs/gromacs-3.3/src/gmxlib/nonbonded/nb_kernel'
make[4]: ** [all-recursive] Erro 1
***********************************************************************************
Hope this can be of use to someone...
Also, thanks a lot for any and all help in advance. :)
Jones
P.S.: I was looking in the web site of the Folding @ Home that they are
already trying AND getting some usefull results in making gromacs run on
certain GPUs. I was wondering, if it become a reallity there, how long it
would be expected to take to be available as a patch or for the official
gromacs to compile? Those co-processors are like a dream for too much people
in the field, and a GPU-Gromacs like the one they are developing would be a
real jump in this subject! :D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20060517/8dd87764/attachment.html>
More information about the gromacs.org_gmx-users
mailing list