[gmx-users] Re: Compiling Gromacs 3.2.1 on IBM p690+ Power4

Sun Aug 1 17:32:30 CEST 2004

Hi Pim,

Yes - it's because of the Altivec loops - they more than double the 
performance. The Power4 chip doesn't contain any SIMD unit, so there is 
no way we could do something similar there. In case somebody proficient 
in PowerPC assembly wants to help writing vanilla assembly loops I'd be 
happy to cooperate on it, though - I don't have the time to learn a new 
instruction set right now :-)


I just found the bug for --enable-vectorized-sqrt, by the way. This 
makes it possible to use the MASS libraries more efficiently, and used 
to boost performance about 20% on Power3. I haven't tried it on Power4 
yet, though.


I'll put this in CVS the next couple of days, but if you're interested 
I've attached the necessary patches to two files in src/gmxlib.

Cheers,

Erik

--- mkinl_calcdist.orig Sun Jan 25 15:46:15 2004
+++ mkinl_calcdist.c    Fri Jul 30 01:19:34 2004
@@ -160,8 +160,6 @@
         }
        }
        nflop += calc_rsq(i,j);
-      if(DO_VECTORIZE) /* calc square separately later, but we  */
-       increment("m","1"); /* need to know the number of items      */
      }

    return nflop;

--- fnbf.old	Thu Jul 29 12:18:25 2004
+++ fnbf.c	Thu Jul 29 12:14:45 2004
@@ -202,14 +202,20 @@


  #ifdef USE_LOCAL_BUFFERS
-  if (buflen==0) {
+  if (buflen==0)
+  {
      buflen=VECTORIZATION_BUFLENGTH;
      snew(drbuf,3*buflen);
      snew(_buf1,buflen+31);
      snew(_buf2,buflen+31);
-    /* use cache aligned buffer pointers */
+    /* use cache aligned buffer pointers when we might call 
SSE/Altivec */
+#if (defined USE_X86_SSE_AND_3DNOW || defined USE_X86_SSE2 || defined 
USE_PPC_ALTIVEC)
      buf1=(real *) ( ( (unsigned long int)_buf1 + 31 ) & (~0x1f) );	
      buf2=(real *) ( ( (unsigned long int)_buf2 + 31 ) & (~0x1f) );	
+#else
+    buf1 = _buf1;
+    buf2 = _buf2;
+#endif
      fprintf(log,"Using buffers of length %d for innerloop 
vectorization.\n",buflen);
    }
  #endif
@@ -257,9 +263,14 @@
      	srenew(drbuf,3*buflen);
      	srenew(_buf1,buflen+31);
      	srenew(_buf2,buflen+31);
-        /* make cache aligned buffer pointers */
+        /* use cache aligned buffer pointers when we might call 
SSE/Altivec */
+#if (defined USE_X86_SSE_AND_3DNOW || defined USE_X86_SSE2 || defined 
USE_PPC_ALTIVEC)
          buf1=(real *) ( ( (unsigned long int)_buf1 + 31 ) & (~0x1f) );	
          buf2=(real *) ( ( (unsigned long int)_buf2 + 31 ) & (~0x1f) );	
+#else
+        buf1 = _buf1;
+        buf2 = _buf2;
+#endif
        }	
  #endif




On Jul 30, 2004, at 3:20 PM, Pim Schravendijk wrote:

>
> Thanks to Fiona for her tips.
>
> Just one additional note: --disable-float will result in a double
> precision compilation, the name is a bit confusing I guess.
>
> I managed to install it now on a 1.7 Ghz Power4 machine. The benchmark
> result for d.villin is quite dissappointing: 5997 ps/day on one node, 
> this
> is more or less half the result of a single node simulation on a 2 ghz
> powermac (11220 ps/day)! Is this the result of the altivec loops on the
> mac? Is there no performance-boosting alternative for the power4? Then 
> the
> power4s are quite a bad buy for gromacs users I guess,
> performance/pricewise.
>
> Greetings, Pim
>
> --
> Pim Schravendijk - PhD Student
> Max Planck Institute for Polymer Research
> http://www.mpip-mainz.mpg.de/~schraven/
>
> _______________________________________________
> gmx-users mailing list
> gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.