[gmx-users] Scaling on IBM/SP2 vs. Linux/AMD

Alan Wilter Sousa da Silva alan at biof.ufrj.br
Thu Nov 14 17:36:48 CET 2002


Weird! Is your ps/hour rate right?

	Here is my incomplete benchmark for the same dppc.

GMX 3.1.3 default single precision, LAM-MPI, -sort -shuffle

Cluster with 8 Dell(dual), only half tested, Giga-ethernet (ps/day)
CPU	Clock	1 proc.	2 proc.	4 proc. 6 proc.	8 proc.
PIII	1000	54.720	136.056	257.136	354.096	423.528
scale		100%	124.3%	117.5%	107.9%	 96.7%

Cluster with 4 proc. via 100Mpbs Ethernet (rest same above)
CPU	Clock	1 proc.	2 proc.	3 proc.	4 proc.
Athlon	1000	54.456	118.848	170.760	218.184
scale		100%	109%	104.5%	100%

And comparison among different processors:
CPU/CLOCK	AthlonT1000	P31000	P41700	XP1533
ps/day		54.456		54.720	72.576	89.808
scale		100%		100.5%	133.3%	164.9%

The first cluster run better when using cpus from different computers.
So, e.g., using 2 cpus on the same node scales worse (124.488 ps/day) than
above (first table).

I could use MPI-CH, but LAM-MPI is much better (faster, more friendly).

On Thu, 14 Nov 2002, Anton Feenstra wrote:

> Early this week I reported on bad scaling on an IBM/SP2 Power3 & Power4
> machine. I've continued these benchmarks, and additionally ran some on our
> Linux AMD 1G-single & 1.3G-dual cluster. The system is the 120000 atom
> DSSP bilayer/water system from the Gromacs benchmark suite. This makes
> for a nice comparison in networking performance, since the CPU performances
> for the IBM and AMD are comparable. The IBM/SP2 should be quite well
> optimized. The MPI-CH on the Linux cluster can probably be improved
> (mainly the short message length setting...).
>
> 	SP2/Power3	SP2/Power4	Linux/AMD 1G	Linux/2 AMD 1.3G
> #CPU	ps/hour	scale	ps/hour	scale	ps/hour	scale	ps/hour	scale
>  1	 2.0	1.00	 3.9	1.00	 2.0	1.00	 3.1	1.00
>  2	 4.2	1.03	 8.5	1.08	 3.9	0.98	 8.5	1.37
>  3	 6.4	1.04	11.9	1.01	 4.8	0.80	 5.8	0.62
>  4	 8.5	1.04	18.3	1.16	 6.5	0.82	12.1	0.97
>  6	12.6	1.03	22.8	0.97	 6.0	0.51	12.6	0.67
>  8	16.3	1.00	30.5	0.97	 7.3	0.46	12.4	0.50
> 10	20.3	1.00	37.9	0.97			11.8	0.38
> 16	31.3	0.96	55.0	0.88
> 18	34.2	0.93
> 20	37.0	0.91
> 24	42.2	0.86
> 28	46.8	0.82
> 32	50.7	0.78
>
> No differences at 1 or 2 CPU's. Still comparable at 4 (and you can see
> you shouldn't use 3 CPU's on a dual-CPU cluster; it's slower than 2!).
> But after that (at >=6 CPU's) the IBM really rules with scalings around
> 90% or better up to 16 (power4) or 20 (power3) CPU's! The dual AMD
> nodes peak out at 6 CPU's with 12.6 ps/h. Scaling for the single AMD's
> is slightly better, which should be expected since the nodes are slower.

-----------------------
Alan Wilter S. da Silva
-----------------------
 Laboratório de Física Biológica
  Instituto de Biofísica Carlos Chagas Filho
   Universidade do Brasil/UFRJ
    Rio de Janeiro, Brasil




More information about the gromacs.org_gmx-users mailing list