[gmx-users] Parallel Gromacs Benchmarking with Opteron Dual-Core & Gigabit Ethernet

Sun Jul 29 12:55:34 CEST 2007

Dear Mr Jahanbakhsh,

Hope that every thing is going okay with your works. While you are trying 
different avenues to increase the performance of the cluster (by the way 
what is the name of cluster?) I was wondering if you could kindly download 
the MODELLER program from the following site and install it on the cluster. 
The license key for the program is: MODELIRANJE
If the license key didn't work you may ask me to register to their site and 
get the license key or you can do it yourself.
http://www.salilab.org/modeller/

Regards, Dastmalchi

----- Original Message ----- 
From: "Kazem Jahanbakhsh" <jahanbakhsh at ee.sharif.edu>
To: "Discussion list for GROMACS users" <gmx-users at gromacs.org>
Sent: Friday, July 27, 2007 1:11 AM
Subject: Re: [gmx-users] Parallel Gromacs Benchmarking with Opteron 
Dual-Core & Gigabit Ethernet

Erik Lindahl wrote:

>Built-in network cards are usually of lower quality, so there's
>probably only a single processor controlling both ports, and since
>the card probably only has a single driver requests might even be
>serialized.
>
My cluster nodes have two on-board Intel i82541PI GbE LAN controllers.
I setup a simulation as below to test this hypothesis that maybe the
built-in ethernet cards don't have a good performance and worsen the
cluster speed-up. I benchmarked the  DPCC system first by running
the Gmx on a single node with a single process and then by running
the similar simulation but on three nodes and single process on
every node, the results are as below.

- Running single mdrun process on single node:

M E G A - F L O P S   A C C O U N T I N G

               NODE (s)   Real (s)      (%)
       Time:   6752.650   6753.000    100.0
                       1h52:32
               (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:      6.573    626.776      0.128    187.574

- Running single mdrun process on three nodes:

M E G A - F L O P S   A C C O U N T I N G

               NODE (s)   Real (s)      (%)
       Time:   2547.000   2547.000    100.0
                       42:27
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     20.309      1.663      0.339     70.750

The obtained speed-up factor from these two simulations will be 2.65.
Sp = (1.663/0.626) = 2.65
But this value is very below than the ideal expected value 3.0, as we all
know the speed-up factor should be near the ideal values when the number of
parallel nodes is low (by about 8 nodes based on application). In other
words, I think that this space between the measured and ideal factors are
very unormal. And also I did't see anyone else had reported such a bad
scalability.

All of these concluded me to this point that something (network hw or sw)
does not work as usual in my cluster. I monitored the statistics of
my gigabit ethernet switch. It did work very normally without any problems.
I mean its input/output queue were empty all of the time, no collision,
no error pkts and the load on the switch fabric for rx and tx (the active
ports) was below than 7-8%. Therefore I think my bottleneck is built-in
lan cards. And I want to replace them by pci cards. I will be very
appreciated to listen any advice or help in this regard to apply to
my case.

>If you have lots of available ports on your gigabit switches and both
>switches+cards support "port trunking" you could try to connect two
>cables to each node and get a virtual 2Gb bandwidth connection. For
>more advanced stuff you'll have to consult the lam-mpi mailing list,
>although I'd be (quite pleasantly) surprised if it's possible to
>tweak gigabit performance a lot!

As I told above, the gigabit eth switch fabric works very well
without any problem during the simulation (at least for three nodes).
For this reason I bonded two built-in gigabit eth ports on every node
as below (bootup shell script):
---------------
#!/bin/bash
modprobe bonding mode=6 miimon=100 # load bonding module
ifconfig bond0 hw ether 00:11:22:33:44:55# MAC address of the bond0 
interface
ifconfig bond0 192.168.55.55 up # bond0 ip addr

ifenslave bond0 eth0# putting the eth0 interface in the slave mod for bond0
ifenslave bond0 eth1# putting the eth1 interface in the slave mod for bond0

--------------
mode=6:
Adaptive load balancing: includes balance-tlb plus receive load balancing
(rlb) for IPV4 traffic, and does not require any special switch support.
Any way, running single mdrun process on three nodes got the following
results:

M E G A - F L O P S   A C C O U N T I N G

               NODE (s)   Real (s)      (%)
       Time:   1195.000   1195.000    100.0
                       19:55
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     39.424      3.546      0.723     33.194

The obtained performance without bonding got 3.333 (GFlops), which means
trunking eth ports on nodes gets us about 7% improvement.

regards,
Kazem

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
gmx-users mailing list    gmx-users at gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-request at gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php