[gmx-developers] Re: Gromacs 3.3.1 parallel benchmarking

Tue Aug 15 18:51:28 CEST 2006

Thanks for the feedback all.

We're using single processor CPU machines with 2 GB of
memory all isolated on the same switch.  I've also
tried on a shared memory 4-processor machine and we're
still seeing sub-linear scaling.  For the single cpu
runs we are getting about 240-250 ns/day performance
with the executables that we build in-house, and a
little under 200 ns/day using the downloaded binaries.
 At 8 processors we're getting only 800 ns/day.

My execution of growmmp and mdrun has been very simple
and just using the "-np number_of_processors" flags
except in the case of the shared memory machines where
I used "-np number_of_processors -nt
number_of_processors" for the mdrun flags.  I've also
tested it out with MPICH, LAM, and Intel-MPI builds,
but can't get away from the sub-linear scaling.  We're
seeing communication between nodes of around 20
mega-bits and latency of between 250-50 ns.  We're
running the simulations on rack systems.  Originally
they were targetted for use of batch serial systems,
but we've upgraded things such as the switch to
gigabit so that we could get better scaling and
learned to run within switch to get good scaling with
DFT codes up to the 40-60 processor range.  We're
starting to think it may be operating system issues,
so we're going to meet with computing support later
today to explore that.

Mike

--- Michael Haverty <mghav at yahoo.com> wrote:

> Hello all,
>   I'm doing some benchmarking of gromacs 3.3.1 on
> SUSE
> 9 systems using Intel Xeon processors on Gigabit
> ethernet, but have been unable to reproduce the
> scaling at 
>
http://www.gromacs.org/gromacs/benchmark/scaling-benchmarks.html
> for Gromacs 3.0.0 and am trying to diagnose why. 
> I'm
> getting sublinear scaling on distributed
> single-processor 3.4 GHz Intel Xeon's with gigabit
> connections.  I'm compiling using the 9.X versions
> of
> Intel compilers and used a wide variety of FFT and
> BLAS libraries with no success in reproducing the
> linear scaling shown in the online benchmarking
> results for the "large DPPC membrane system".
>   Have any changes in the code been implemented
> since
> 3.0.0 that would likely change this scaling behavior
> and/or has anyone done similar parallel benchmarking
> with 3.3.1?  We'd like to start using this code for
> up
> to 100's of millions of atoms system, but are
> currently limited by this poor scaling.
>   Thanks for any input or suggestions you can
> provide!
> Mike
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com