[gmx-developers] Gromacs 3.3.1 parallel benchmarking

Tue Aug 15 01:37:04 CEST 2006

On 8/14/06, Michael Haverty <mghav at yahoo.com> wrote:
> Hello all,
>   I'm doing some benchmarking of gromacs 3.3.1 on SUSE
> 9 systems using Intel Xeon processors on Gigabit
> ethernet, but have been unable to reproduce the
> scaling at
> http://www.gromacs.org/gromacs/benchmark/scaling-benchmarks.html
> for Gromacs 3.0.0 and am trying to diagnose why.  I'm
> getting sublinear scaling on distributed
> single-processor 3.4 GHz Intel Xeon's with gigabit

hi.

the way parallelization in gromacs 3.x works makes it
extremely sensitive to latencies in communication. i have
not followed the development, but especially for gigabit
with its high latencies a few additional mpi calls can
make all the difference. the fact that you have faster
cpus with additional SSE functionality, may have an
impact as well (how does the single processor speed
compare?). i assume, that the machine in the benchmark
has dual-processor nodes, so in comparison to your
single processors you will have less impact from the
gigabit on the per processor scaling (as a significant
part is the within-node communication).

OTOH, i have recently seen _extremely_ good scaling
with gromacs 3.3.x on a crays xt3 machine. which is built
with a special 3d-torus, low-latency network, that is also
running a special trimmed down kernel to reduce latencies
injected from the OS.

in fact, if you are running a standard desktop linux on
your nodes, you may want to cut down daemon processes
and hotplug and whatnot on your nodes to the bare minimum
perhaps even try out a minimal custom kernel to reduce
the latencies from the so-called OS-jitter...

> connections.  I'm compiling using the 9.X versions of
> Intel compilers and used a wide variety of FFT and
> BLAS libraries with no success in reproducing the

compilers and libraries should not have a significant speed
impact, since the majority of the MD work is done in the
assembly loops.

hope that helps,
   axel.

> linear scaling shown in the online benchmarking
> results for the "large DPPC membrane system".
>   Have any changes in the code been implemented since
> 3.0.0 that would likely change this scaling behavior
> and/or has anyone done similar parallel benchmarking
> with 3.3.1?  We'd like to start using this code for up
> to 100's of millions of atoms system, but are
> currently limited by this poor scaling.
>   Thanks for any input or suggestions you can provide!
> Mike
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
  Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.