[gmx-users] Preliminary report on benchmark on Opteron Cluster Infiniband on current CVS version
Florian Haberl
Florian.Haberl at chemie.uni-erlangen.de
Tue Apr 17 19:31:29 CEST 2007
Hi,
On Tuesday, 17. April 2007 18:26, Luca Ferraro wrote:
> Hello GROMACS world,
>
> I would like to report a little benchmark activity within some
> preliminary results I recently performed on two different cluster
> (Opteron-SC and Opteron-DC on InfiniBand) using the current CVS version
> of GROMACS.
>
> From these preliminary tests, it seems that the speedup and scaling are
> quite poor on the DPPC test. As reported the best speedup is obtained
> using the PGI6.2+ACML3.6 combination, giving a speedup of 10.9 over 16
> CPUs with 68% of scaling. The same DPPC has been performed using the
> GROMACS v3.3.1 giving better results, showing almost a linear speedup up
> over 16 CPUs with 80% of scaling [results not reported here, but
> available if you want].
>
> It seems I will not ever reach the incredible scaling declared on this
> new domain-decomposition release as reported in many posts ... What's
> wrong with my benchmarks? Am I missing something?
have you really used domain decomposing or is it just a copy paste error in
your script, cause -d don`t wake up domain decomposing, have you also looked
in mdout if it is really used, cause scaling looks really poor in comparision
to other results.
If tested it on woodcrest architecture (3ghz) with infiniband interconnect but
not with seperated pme nodes (rather old cvs version think somewhere in
January):
cores time gflops ns/day DD
4 6:00 8.621 2.400 2 2 1
8 3:06 16.697 4.645 4 2 1
8 3:06 16.747 4.645 8 1 1
16 5:39 12.491 2.549 1 1 1
16 2:18 30.771 6.261 8 2 1
16 1:40 31.114 8.640 4 4 1
16 1:48 28.765 8.000 4 2 2
16 1:49 28.547 7.927 2 4 2
32 3:06 22.843 4.645 4 4 2
32 1:51 38.287 7.784 8 4 1
32 1:16 55.920 11.368 8 4 1
64 0:34 92.098 25.412 8 8 1
greetings,
Florian
>
> Thank you in advance for your kind attention.
>
> luca
>
> #####################################################################
>
> METHOD:
>
> I have checked out the CVS version and build it into many targets, each
> for all the supported compiler installed on our cluster, using the
> MVAPICH-0.9.8 driver (as reported below in the ### CONFIG DETAILS ###
> section).
>
> I have run all the benchmarks taken from the gmxbench-3.0.tar.gz package
> (to fix: the "title" keyword in grompp.mpd is always set to dppc in all
> benches). I have used different directories for the lzm case (separating
> the PME case from the simple-cutoff one).
>
> Anyway, since we are looking for speedup and scaling up to 32 CPUs, I
> focused on the DPPC test case (about 130000 atoms), since the number of
> atoms per process would decrease too much and fit into the cache,
> altering the benchmark.
>
> The mdrun program from the CVS use a domain-decomposition scheme: so I
> choose to split the domain along the Y-axis. However, further benchmarks
> have been performed also using the full decomposition on all axes (for
> example: -d 2 2 2 on 8 CPUs, -d 4 4 2 on 32 CPUS), without any
> significant improvement.
>
> Benchmarks have been run using 1, 2, 4, 8, 16 and 32 processes, using
> the following running command (taken from my script):
>
> [SNIP]
>
> for dir in d.dppc d.lzm-cutoff d.lzm-pme d.poly-ch2 d.villin; do
> for proc in 1 2 4 8 16 32; do
> # setting up benchmark directory
> [SNIP]
>
> # running benchmark
> grompp
> /usr/bin/time mpiexec -n ${proc} mdrun_mpi -d 1 ${proc} 1
>
> # collect results
> [SNIP]
>
> done
> done
>
>
>
> RESULTS:
>
> I report some results from the DPPC case, where:
> - proc is the number of MPI processes (not processors!)
> - (DC) means on a dual-core CPU using the DCORE Cluster,
> otherwise single-core on the INODE Cluster.
> - no multi-threading has been used.
>
> # Real time in seconds for the run (taken from md0.log)
> #proc gnu4.1 pgi6.2 intel9.1 gnu(DC) pgi(DC)
> 1 3084.450 3929.020 3632.470 3053.370 3304.330
> 2 1810.000 1977.000 1771.000 1805.000 2080.000
> 4 1093.000 1182.000 1077.000 1101.000 1206.000
> 8 610.000 650.000 604.000 599.000 653.000
> 16 336.000 360.000 340.000 339.000 364.000
> 32 202.000 210.000 243.000 207.000 210.000
>
> # speedup = p_1/(p_N), N is $proc
> #proc gnu4.1 pgi6.2 intel9.1 gnu(DC) pgi(DC)
> 1 1.00 1.00 1.00 1.00 1.00
> 2 1.70 1.99 2.05 1.69 1.59
> 4 2.82 3.32 3.37 2.77 2.74
> 8 5.06 6.04 6.01 5.10 5.06
> 16 9.18 10.91 10.68 9.01 9.08
> 32 15.27 18.71 14.95 14.75 15.73
>
> # scaling = p_1/(N*p_N), N is $proc
> #proc gnu4.1 pgi6.2 intel9.1 gnu(DC) pgi(DC)
> 1 100.00% 100.00% 100.00% 100.00% 100.00%
> 2 85.21% 99.37% 102.55% 84.58% 79.43%
> 4 70.55% 83.10% 84.32% 69.33% 68.50%
> 8 63.21% 75.56% 75.18% 63.72% 63.25%
> 16 57.37% 68.21% 66.77% 56.29% 56.74%
> 32 47.72% 58.47% 46.71% 46.10% 49.17%
>
>
>
> ############## PLATFORM DETAILS ######################
>
> INODE CLUSTER:
> - 24 nodes - 2way Opteron (single-core rev 250) at 2.4GHz with 4 GB RAM
> - InfiniBand - Silverstorm InfiniHost III SDR
> - switch SilverStorm 9120 InfiniBand 4X DDR 20Gb/s
>
> DCORE CLUSTER:
> - 24 nodes - 2way Opteron (dual-core rev 280) at 2.4GHz with 8 GB RAM
> - InfiniBand - Silverstorm InfiniHost III DDR
> - switch SilverStorm 9120 InfiniBand 4X DDR 20Gb/s
>
>
> ############## CONFIG DETAILS ######################
> BUILT TARGETS:
> - intel-9.1, MKL 8.1, FFTW3
> - gnu-4.1, ACML 3.6, FFTW3
> - pgi-6.2, ACML 3.6, FFTW3
>
> MPIMODULE:
> MPI on driver mvapich-0.9.8
>
> # Configure for general installation:
> ./configure --prefix=$PWD/gromax4_${TARGET} \
> --with-fft=fftw3 \
> --without-xml --disable-threads \
> --with-external-blas --with-external-lapack
>
> # Configure for the MPI version of mdrun program:
> ./configure --prefix=$PWD/gromax4_${TARGET} \
> --enable-mpi --program-suffix=_${MPIMODULE} \
> --with-fft=fftw3 \
> --without-xml --disable-threads \
> --with-external-blas --with-external-lapack
> ##################################################
--
-------------------------------------------------------------------------------
Florian Haberl
Computer-Chemie-Centrum
Universitaet Erlangen/ Nuernberg
Naegelsbachstr 25
D-91052 Erlangen
Telephone: +49(0) − 9131 − 85 26581
Mailto: florian.haberl AT chemie.uni-erlangen.de
-------------------------------------------------------------------------------
More information about the gromacs.org_gmx-users
mailing list