[gmx-users] Parallel Gromacs Benchmarking with Opteron Dual-Core & Gigabit Ethernet

Sun Jul 22 18:34:25 CEST 2007

Hi,

On Jul 22, 2007, at 6:08 PM, Kazem Jahanbakhsh wrote:
>
> mpirun -np 8 mdrun_d -v -deffnm grompp
>

First, when you run in double precision you will communicate exactly  
twice as much data. Since gigabit ethernet is usually both latency  
and bandwidth-limiting, you might get better scaling (as well as  
performance) in single precision.
>
> Very bad scalability!
> I expected in about 4.5 GFlops, but the results are like 2 nodes  
> execution. In other words, the third node did nothing for us at  
> all. I googled Gmx mailing lists, and saw many topics in this  
> regard. I think that gigabit ethernet's latency is the performance  
> killer here. I want to know is there any solution for this problem  
> like recompiling kernel, tcp/ip stack parameters tunning, LAM  
> recompilation, setup simulations in different way or anything else?

Yes, ethernet is definitely limiting you. Not only because the  
latency is high, but since 4 processors share a single network card  
they will only get 1/4 of the bandwidth each (and gigabit ethernet is  
often far from a gigabit in practice). Additionally, when they cores  
try to send messages simultaneously three out of four will have to  
wait, which makes the latency even worse.

I forgot the exact parameter names, but there are some environment  
variables you can set in LAM-MPI to force it to send larger "short"  
messages.  Small messages are normally sent directly to a buffer, but  
for a large message LAM sends an extra pre-message to tell the  
receiving node to allocate memory - by increasing this limit you'll  
reduce latency for large messages.

If you're lucky your network cards might also work with a solution  
like MPI-GAMMA, which bypasses the TCP/IP stack completely

Finally, you can experiment with the node order, and e.g. make a  
hostfile like

node1
node2
node3
node1
node2
node3
...

This might help reduce packet contention, but it could also make  
things worse...

Gromacs 4 (CVS) will improve scaling significantly in many cases, but  
sooner or later you'll want infiniband :-)

Cheers,

Erik