[gmx-users] Re: Compiling Gromacs 3.2.1 on IBM p690+ Power4
Erik Lindahl
lindahl at csb.stanford.edu
Sun Aug 1 17:32:30 CEST 2004
Hi Pim,
Yes - it's because of the Altivec loops - they more than double the
performance. The Power4 chip doesn't contain any SIMD unit, so there is
no way we could do something similar there. In case somebody proficient
in PowerPC assembly wants to help writing vanilla assembly loops I'd be
happy to cooperate on it, though - I don't have the time to learn a new
instruction set right now :-)
I just found the bug for --enable-vectorized-sqrt, by the way. This
makes it possible to use the MASS libraries more efficiently, and used
to boost performance about 20% on Power3. I haven't tried it on Power4
yet, though.
I'll put this in CVS the next couple of days, but if you're interested
I've attached the necessary patches to two files in src/gmxlib.
Cheers,
Erik
--- mkinl_calcdist.orig Sun Jan 25 15:46:15 2004
+++ mkinl_calcdist.c Fri Jul 30 01:19:34 2004
@@ -160,8 +160,6 @@
}
}
nflop += calc_rsq(i,j);
- if(DO_VECTORIZE) /* calc square separately later, but we */
- increment("m","1"); /* need to know the number of items */
}
return nflop;
--- fnbf.old Thu Jul 29 12:18:25 2004
+++ fnbf.c Thu Jul 29 12:14:45 2004
@@ -202,14 +202,20 @@
#ifdef USE_LOCAL_BUFFERS
- if (buflen==0) {
+ if (buflen==0)
+ {
buflen=VECTORIZATION_BUFLENGTH;
snew(drbuf,3*buflen);
snew(_buf1,buflen+31);
snew(_buf2,buflen+31);
- /* use cache aligned buffer pointers */
+ /* use cache aligned buffer pointers when we might call
SSE/Altivec */
+#if (defined USE_X86_SSE_AND_3DNOW || defined USE_X86_SSE2 || defined
USE_PPC_ALTIVEC)
buf1=(real *) ( ( (unsigned long int)_buf1 + 31 ) & (~0x1f) );
buf2=(real *) ( ( (unsigned long int)_buf2 + 31 ) & (~0x1f) );
+#else
+ buf1 = _buf1;
+ buf2 = _buf2;
+#endif
fprintf(log,"Using buffers of length %d for innerloop
vectorization.\n",buflen);
}
#endif
@@ -257,9 +263,14 @@
srenew(drbuf,3*buflen);
srenew(_buf1,buflen+31);
srenew(_buf2,buflen+31);
- /* make cache aligned buffer pointers */
+ /* use cache aligned buffer pointers when we might call
SSE/Altivec */
+#if (defined USE_X86_SSE_AND_3DNOW || defined USE_X86_SSE2 || defined
USE_PPC_ALTIVEC)
buf1=(real *) ( ( (unsigned long int)_buf1 + 31 ) & (~0x1f) );
buf2=(real *) ( ( (unsigned long int)_buf2 + 31 ) & (~0x1f) );
+#else
+ buf1 = _buf1;
+ buf2 = _buf2;
+#endif
}
#endif
On Jul 30, 2004, at 3:20 PM, Pim Schravendijk wrote:
>
> Thanks to Fiona for her tips.
>
> Just one additional note: --disable-float will result in a double
> precision compilation, the name is a bit confusing I guess.
>
> I managed to install it now on a 1.7 Ghz Power4 machine. The benchmark
> result for d.villin is quite dissappointing: 5997 ps/day on one node,
> this
> is more or less half the result of a single node simulation on a 2 ghz
> powermac (11220 ps/day)! Is this the result of the altivec loops on the
> mac? Is there no performance-boosting alternative for the power4? Then
> the
> power4s are quite a bad buy for gromacs users I guess,
> performance/pricewise.
>
> Greetings, Pim
>
> --
> Pim Schravendijk - PhD Student
> Max Planck Institute for Polymer Research
> http://www.mpip-mainz.mpg.de/~schraven/
>
> _______________________________________________
> gmx-users mailing list
> gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list