[gmx-users] No improvement in scaling on introducing flow control

Carsten Kutzner ckutzne at gwdg.de
Thu Oct 25 14:10:49 CEST 2007


Hi Himanshu,

maybe your problem is not even flow control, but the limited network
bandwidth which is shared among 4 CPUs in your case. I also have done
benchmarks on Woodcrests (2.33 GHz) and was not able to scale an 80000
atom system beyond 1 node with Gbit Ethernet. Looking in more detail,
the time gained by the additional 4 CPUs of a second node was exactly
balanced by the extra communication. I used only 1 network interface for
that benchmark, leaving effectively only 1/4 th of the bandwidth for
each CPU. Using two interfaces with OpenMPI did not double the network
performance on our cluster. In my tests nodes with 2 CPUs sharing one
NIC were faster than nodes with 4 CPUs sharing two NICs. Could be
on-node contention, since both interfaces probably end up on the same
bus internally.

Regards,
  Carsten


himanshu khandelia wrote:
> Hi,
> 
> We tried turning on switch control on our local cluster
> (www.dcsc.sdu.dk) but were unable to achieve any improvement in scale
> up whatsoever. I was wondering if you folks could shed light upon how
> we should go ahead with this. (We have not installed the all-to-all
> patch yet)
> 
> The cluster architecture is as follows:
> ##########
> * Computing nodes
> 160x Dell PowerEdge 1950 1U rackmountable servers with 2 2,66Ghz Intel
> Woodcrest CPUs, 4 GB Ram, 2x160 GB HDD (7200rpm, 8 MB buffer,
> SATA150), 2x Gigabit Ethernet
> 40x Dell PowerEdge 1950 1U rackmountable servers with 2 2,66Ghz Intel
> Woodcrest CPUs, 8 GB Ram, 2x160 GB HDD (7200rpm, 8 MB buffer,
> SATA150), 2x Gigabit Ethernet
> ##########
> * Switches
> 9 D-link SR3324
> 2 D-link SRi3324
> The switches are organised in two stacks, each connected to the
> infrastracture switch with an 8 Gb/s LACP trunk.Firmware Build on the
> switches
> ##########
> * Firmware Build on the switches: 3.00-B16
> There are newer firmware builds available, but according to the update
> logs, there is not update on the IEEE flow control protocol in the new
> firmware
> ##########
> * Tests (were run using OPENMPI, not LAMMPI)
> DPPC-bilayer system of ~ 40000 atoms, with PME and cutoffs, 1fs time
> step. The scaleup data is as follows. We are also currently running
> some tests with larger systems.
> 
> # Procs     nanoseconds/day     Scaleup
> 1              0.526                       1
> 2              1.0                           1.90
> 4              1.768                        3.36
> 8              1.089                        2.07
> 16            0.39                          0.74
> 
> Any inputs will be very helpful, thank you
> 
> Best,
> 
> -himanshu
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

-- 
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics Department
Am Fassberg 11
37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/research/dep/grubmueller/
http://www.gwdg.de/~ckutzne



More information about the gromacs.org_gmx-users mailing list