[gmx-users] Parallel Gromacs Benchmarking with Opteron Dual-Core & Gigabit Ethernet

Kazem Jahanbakhsh jahanbakhsh at ee.sharif.edu
Mon Jul 23 12:13:45 CEST 2007


Hi

First of all, thanks for your reply.
On Sun, 22 Jul 2007 18:34, Erik Lindahl wrote:

>Yes, ethernet is definitely limiting you. Not only because the
>latency is high, but since 4 processors share a single network card
>they will only get 1/4 of the bandwidth each (and gigabit ethernet is
>often far from a gigabit in practice). Additionally, when they cores
>try to send messages simultaneously three out of four will have to
>wait, which makes the latency even worse.

I read at Gmx site that the DPPC system
composed of 121,856 atoms. I saw the gmx topology files, it
seems that Gmx makes data decomposition on input data to run in parallel
(in our simulation case using "-np 12" for execution
on 3 nodes, the data space for every process is about 10156 atoms).
I think that the DPPC system's size is not so big enough that someone
can sense the scalability of parallel execution in the existence of
Gigabit Eth. I mean to see the Cluster scalability in our configuration,
we should setup a bigger simulation. Pls correct me, if I'm in mistake.

>I forgot the exact parameter names, but there are some environment
>variables you can set in LAM-MPI to force it to send larger "short"
>messages.  Small messages are normally sent directly to a buffer, but
>for a large message LAM sends an extra pre-message to tell the
>receiving node to allocate memory - by increasing this limit you'll
>reduce latency for large messages.

Yes, I saw this solution at Gmx site before and checked on my Linux
Cluster by recompilation of lam as below:

./configure --with-fc=gfortran --with-rpi=usysv --with-tcp-short=524288
--with-shm-short=524288
(I don't have any sense about the best values for tcp short message.
I used above values instead of 64k default param).

After this, I repeated the DPPC simulation again and got the
following results in summary:

	M E G A - F L O P S   A C C O U N T I N G

               NODE (s)   Real (s)      (%)
       Time:   1257.000   1257.000    100.0
                       20:57
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     37.487      3.373      0.687     34.917

Unfortunately, it does not show any sensible improvement.
Now, I want to know should this solution makes the results better?
Or maybe parametrs are not chosen so good for my config?

>Finally, you can experiment with the node order, and e.g. make a
>hostfile like

I reordered my hostfile as you told, but the results again does not
get better performance at all:

	M E G A - F L O P S   A C C O U N T I N G

               NODE (s)   Real (s)      (%)
       Time:   1277.000   1277.000    100.0
                       21:17
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     36.900      3.320      0.677     35.472



>Gromacs 4 (CVS) will improve scaling significantly in many cases, but
>sooner or later you'll want infiniband :-)

Do you mean that I should download and install Gmx from your CVS
repository? Will this make the parrallel execution more efficient
than before for Gigabit Eth?

Sincerely,
Kazem
Computing Center,
Sharif University of Tech.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the gromacs.org_gmx-users mailing list