[gmx-users] Benchs on gcc and pgi

Wed May 17 08:22:49 CEST 2006

Hi all!

Well, first of all, sorry if it's on the wrong gromacs list, but from what I
could see on the website I could not find a clear indication on where to put
benchmarks.

Anyway, some time ago I asked the list for help on making this benchmarks,
on which I want to compare different compilers. I've been able to compile
and run the benchmarks for GCC (double and single precision) and portland
(single precision). Unfortunatelly,m I could not make it work with Intel
Compiler (yes, I will ask for help again later.. ;)  ).

Well, here we go: first, the benchmarks with the CPU usage of about 98%
(varies among the tests) that I've got. After, I put the same benchs, but
with a "rescale" of the performances of each tests for a 100% CPU usage:

 *Machine* *CPU/Core* *Compiler* *Clock (MHz)* *Cache (kb)* *Benchmark*
Type N Villin Lys/Cut Lys/PME DPPC Poly-CH2 Average Rate  Linux Athlon 1 gcc
800 512 2412 622 456 41 1001 100 1.00  Linux Athlon 64 1 gcc 1800 512 9607
 2686
 1778
 178
 4344
 410
 1.82
  Linux Athlon 64 1 gcc + acml
 1800 512 9604
 2687
 1782
 178
 4336
 410
 1.82
  Linux Athlon 64 1 gcc (dp)
 1800 512 5607
 1633
 1175
 117
 3420
 264
 1.17
  Linux Athlon 64 1 gcc + acml (dp)
 1800 512 5604
 1637
 1174
 118
 3423
 264
 1.17
  Linux Athlon 64 1 portland
 1800 512 9177
 2500
 1638
 166
 3905
 384
 1.71
  Linux Athlon 64 1 portland + acml
 1800 512 9181
 2499
 1639
 166
 3905
 384
 1.71
  Linux Athlon 64 1 gcc {100%}
 1800 512 9823
 2730
 1844
 186
 4455
 420
 1.87
  Linux Athlon 64 1 gcc + acml {100%} 1800 512 9820
 2762
 1815
 182
 4438
 420
 1.87
  Linux Athlon 64 1 gcc (dp) {100%} 1800 512 5716
 1675
 1205
 120
 3519
 270
 1.20
  Linux Athlon 64 1 gcc + acml (dp) {100%} 1800 512 5707
 1672
 1203
 121
 3500
 269
 1.20
  Linux Athlon 64 1 portland {100%} 1800 512 9355
 2546
 1668
 171
 3989
 392
 1.74
  Linux Athlon 64 1 portland + acml {100%} 1800 512 9368
 2545
 1671
 169
 3989
 392
 1.74

Well, let us see what I could conclude from here: firt, portland is worst
than GCC compilers (not comparable, but worst). That's already bad. But,
even worst, is the fact that the use of the ACML libraries or yeld very poor
extra performance, or just lose the race against the common gcc compilation.

Anyone could tell me if this kind of behavior, of both PGI and acml use as
external blas and lapack, is correct?

Also, is there any extra performance to be gained from the use of Intel
Compilers on this architecture? Does anybody got the following type of error
during compilation (in the 1/sqrt() optimized function) before?

***********************************************************************************
./mknb   -software_invsqrt
 >>> Gromacs nonbonded kernel generator (-h for help)
 >>> Generating single precision functions in C.
 >>> Using Gromacs software version of 1/sqrt(x).
 make[5]: *** [kernel-stamp] Falha de segmentação
 make[5]: Leaving directory
`/home/johannes/src/gromacs/gromacs-3.3/src/gmxlib/nonbonded/nb_kernel'
 make[4]: ** [all-recursive] Erro 1
***********************************************************************************

Hope this can be of use to someone...

Also, thanks a lot for any and all help in advance. :)

Jones

P.S.: I was looking in the web site of the Folding @ Home that they are
already trying AND getting some usefull results in making gromacs run on
certain GPUs. I was wondering, if it become a reallity there, how long it
would be expected to take to be available as a patch or for the official
gromacs to compile? Those co-processors are like a dream for too much people
in the field, and a GPU-Gromacs like the one they are developing would be a
real jump in this subject! :D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20060517/8dd87764/attachment.html>