[gmx-users] Parallel Gromacs Benchmarking with Opteron Dual-Core & Gigabit Ethernet

Kazem Jahanbakhsh jahanbakhsh at ee.sharif.edu
Mon Jul 23 12:13:45 CEST 2007


First of all, thanks for your reply.
On Sun, 22 Jul 2007 18:34, Erik Lindahl wrote:

>Yes, ethernet is definitely limiting you. Not only because the
>latency is high, but since 4 processors share a single network card
>they will only get 1/4 of the bandwidth each (and gigabit ethernet is
>often far from a gigabit in practice). Additionally, when they cores
>try to send messages simultaneously three out of four will have to
>wait, which makes the latency even worse.

I read at Gmx site that the DPPC system
composed of 121,856 atoms. I saw the gmx topology files, it
seems that Gmx makes data decomposition on input data to run in parallel
(in our simulation case using "-np 12" for execution
on 3 nodes, the data space for every process is about 10156 atoms).
I think that the DPPC system's size is not so big enough that someone
can sense the scalability of parallel execution in the existence of
Gigabit Eth. I mean to see the Cluster scalability in our configuration,
we should setup a bigger simulation. Pls correct me, if I'm in mistake.

>I forgot the exact parameter names, but there are some environment
>variables you can set in LAM-MPI to force it to send larger "short"
>messages.  Small messages are normally sent directly to a buffer, but
>for a large message LAM sends an extra pre-message to tell the
>receiving node to allocate memory - by increasing this limit you'll
>reduce latency for large messages.

Yes, I saw this solution at Gmx site before and checked on my Linux
Cluster by recompilation of lam as below:

./configure --with-fc=gfortran --with-rpi=usysv --with-tcp-short=524288
(I don't have any sense about the best values for tcp short message.
I used above values instead of 64k default param).

After this, I repeated the DPPC simulation again and got the
following results in summary:

	M E G A - F L O P S   A C C O U N T I N G

               NODE (s)   Real (s)      (%)
       Time:   1257.000   1257.000    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     37.487      3.373      0.687     34.917

Unfortunately, it does not show any sensible improvement.
Now, I want to know should this solution makes the results better?
Or maybe parametrs are not chosen so good for my config?

>Finally, you can experiment with the node order, and e.g. make a
>hostfile like

I reordered my hostfile as you told, but the results again does not
get better performance at all:

	M E G A - F L O P S   A C C O U N T I N G

               NODE (s)   Real (s)      (%)
       Time:   1277.000   1277.000    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     36.900      3.320      0.677     35.472

>Gromacs 4 (CVS) will improve scaling significantly in many cases, but
>sooner or later you'll want infiniband :-)

Do you mean that I should download and install Gmx from your CVS
repository? Will this make the parrallel execution more efficient
than before for Gigabit Eth?

Computing Center,
Sharif University of Tech.

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the gromacs.org_gmx-users mailing list