[gmx-users] Scaling on IBM/SP2 vs. Linux/AMD
Alan Wilter Sousa da Silva
alan at biof.ufrj.br
Thu Nov 14 17:36:48 CET 2002
Weird! Is your ps/hour rate right?
Here is my incomplete benchmark for the same dppc.
GMX 3.1.3 default single precision, LAM-MPI, -sort -shuffle
Cluster with 8 Dell(dual), only half tested, Giga-ethernet (ps/day)
CPU Clock 1 proc. 2 proc. 4 proc. 6 proc. 8 proc.
PIII 1000 54.720 136.056 257.136 354.096 423.528
scale 100% 124.3% 117.5% 107.9% 96.7%
Cluster with 4 proc. via 100Mpbs Ethernet (rest same above)
CPU Clock 1 proc. 2 proc. 3 proc. 4 proc.
Athlon 1000 54.456 118.848 170.760 218.184
scale 100% 109% 104.5% 100%
And comparison among different processors:
CPU/CLOCK AthlonT1000 P31000 P41700 XP1533
ps/day 54.456 54.720 72.576 89.808
scale 100% 100.5% 133.3% 164.9%
The first cluster run better when using cpus from different computers.
So, e.g., using 2 cpus on the same node scales worse (124.488 ps/day) than
above (first table).
I could use MPI-CH, but LAM-MPI is much better (faster, more friendly).
On Thu, 14 Nov 2002, Anton Feenstra wrote:
> Early this week I reported on bad scaling on an IBM/SP2 Power3 & Power4
> machine. I've continued these benchmarks, and additionally ran some on our
> Linux AMD 1G-single & 1.3G-dual cluster. The system is the 120000 atom
> DSSP bilayer/water system from the Gromacs benchmark suite. This makes
> for a nice comparison in networking performance, since the CPU performances
> for the IBM and AMD are comparable. The IBM/SP2 should be quite well
> optimized. The MPI-CH on the Linux cluster can probably be improved
> (mainly the short message length setting...).
>
> SP2/Power3 SP2/Power4 Linux/AMD 1G Linux/2 AMD 1.3G
> #CPU ps/hour scale ps/hour scale ps/hour scale ps/hour scale
> 1 2.0 1.00 3.9 1.00 2.0 1.00 3.1 1.00
> 2 4.2 1.03 8.5 1.08 3.9 0.98 8.5 1.37
> 3 6.4 1.04 11.9 1.01 4.8 0.80 5.8 0.62
> 4 8.5 1.04 18.3 1.16 6.5 0.82 12.1 0.97
> 6 12.6 1.03 22.8 0.97 6.0 0.51 12.6 0.67
> 8 16.3 1.00 30.5 0.97 7.3 0.46 12.4 0.50
> 10 20.3 1.00 37.9 0.97 11.8 0.38
> 16 31.3 0.96 55.0 0.88
> 18 34.2 0.93
> 20 37.0 0.91
> 24 42.2 0.86
> 28 46.8 0.82
> 32 50.7 0.78
>
> No differences at 1 or 2 CPU's. Still comparable at 4 (and you can see
> you shouldn't use 3 CPU's on a dual-CPU cluster; it's slower than 2!).
> But after that (at >=6 CPU's) the IBM really rules with scalings around
> 90% or better up to 16 (power4) or 20 (power3) CPU's! The dual AMD
> nodes peak out at 6 CPU's with 12.6 ps/h. Scaling for the single AMD's
> is slightly better, which should be expected since the nodes are slower.
-----------------------
Alan Wilter S. da Silva
-----------------------
Laboratório de Física Biológica
Instituto de Biofísica Carlos Chagas Filho
Universidade do Brasil/UFRJ
Rio de Janeiro, Brasil
More information about the gromacs.org_gmx-users
mailing list