Dear David,

here are the VERY preliminary results. I've run d.dppc as I have 
downloaded it from the gromacs site.

Please do not consider these numbers as significant for any comparisions 
between processors/architecture. This is NOT optimized.

I think that it should be possible to improve performance by changing 
compilation options (expecially in fftw and gromacs).
For the time being I just wanted to check if the code compiled and run 

Are there any news on the new assembly loops for the Opteron?
I read a message mentioning that gromacs 4 will know of the k8 architecture.

What do you think of creating a page on the gromacs site with the best 
optimization flags for the major processors? I guess that most users 
have tried this, but it is quite complex and time-consuming. I'm 
thinking of something like a Wiki.

It would also be great to have a place where people could share 
protocols for the most common simulations.this would make it easier to 
learn (for novices) and to compare (for experts)


System: Opteron quad processor
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 846
stepping : 8
cpu MHz : 1993.080
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips : 3948.54
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts ttp

OS: Suse Prof. 9.1 64 bit version

All tests run with mdrun -np 4 -shuffle -sort

LAM 7.0.6


fftw (no_386_hacks)


Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/specs

Configured with: ../configure --enable-threads=posix --prefix=/usr 
--with-local-prefix=/usr/local --infodir=/usr/share/info 
--mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada 
--disable-checking --libdir=/usr/lib64 --enable-libgcj 
--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib64 
--with-system-zlib --enable-shared --enable-__cxa_atexit x86_64-suse-linux

Thread model: posix

gcc version 3.3.3 (SuSE Linux)


mpirun -ssi rpi usysv -v C

NODE (s) Real (s) (%)

Time: 1413.000 1413.000 100.0


(Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)

Performance: 82.451 2.973 25.478 39.250


mpirun -ssi rpi sysv -v C

NODE (s) Real (s) (%)

Time: 1412.000 1412.000 100.0


(Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)

Performance: 82.509 2.975 25.496 39.222

With a different setup for usysv (short message size)

mpirun -ssi rpi usysv -v C

NODE (s) Real (s) (%)

Time: 1427.000 1427.000 100.0


(Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)

Performance: 81.642 2.944 25.228 39.639

PGI – Benchmarks

uname -m = x86_64

uname -r = 2.6.5-7.75-smp

uname -s = Linux

uname -v = #1 SMP Mon Jun 14 10:44:37 UTC 2004

/usr/bin/uname -p = unknown

/bin/uname -X = unknown

/bin/arch = x86_64

/usr/bin/arch -k = unknown

/usr/convex/getsysinfo = unknown

hostinfo = unknown

/bin/machine = unknown

/usr/bin/oslevel = unknown

/bin/universe = unknown

/usr/pgi/linux86-64/5.1/bin/pgcc -V

pgcc 5.1-6

Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.

Copyright 2000-2003, STMicroelectronics, Inc. All Rights Reserved.

PGI COMPILER (using -tp=k8-64)

LAM 7.0.6



NODE (s) Real (s) (%)

Time: 1753.000 1753.000 100.0


(Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)

Performance: 66.459 2.397 20.536 48.694


mpirun -ssi rpi usysv -v C

NODE (s) Real (s) (%)

Time: 1745.000 1745.000 100.0


(Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)

Performance: 66.764 2.408 20.630 48.472

