[gmx-users] gromacs-4.0.2, parallel performance in two quad core xeon machines

Claus Valka lastexile7gr at yahoo.de
Wed Feb 18 21:56:16 CET 2009


Hello,thank you for your answer. I just wondering though. How am I supposed to have a system with more than 99999 atoms, while the gro file has a fixed format giving up to 5 digits in the number of atoms? 
What else should I change in order to succeed better performance from my hardware if I can succeed having a much bigger system? You say so that ethernet has reached its limits. I was concidering using a supercomputing center in Europe and as far as I know they are using nodes which are using the Cell 9 core processors technology in each node. How someone there can accomplish a better performance using gromacs 4 using more nodes? Which might be the limit there in such machines.  Thank you once again,Nikos
--- Berk Hess <gmx3 at hotmail.com> schrieb am Mi, 18.2.2009:
Von: Berk Hess <gmx3 at hotmail.com>
Betreff: RE: [gmx-users] gromacs-4.0.2, parallel performance in two quad core xeon machines
An: lastexile7gr at yahoo.de
Datum: Mittwoch, 18. Februar 2009, 19:16




#yiv278737063 .hmmessage P
{
margin:0px;padding:0px;}
#yiv278737063 {
font-size:10pt;font-family:Verdana;}


 
Hi,

You can not scale a system of just 7200 atoms
to 16 cores which are connected by ethernet.
400 atoms per core is already the scaling limit of Gromacs
on current hardware with the fastest available network.

On ethernet a system 100 times as large might scale well to two nodes.

Berk


Date: Wed, 18 Feb 2009 09:40:28 -0800
From: lastexile7gr at yahoo.de
To: gmx-users at gromacs.org
Subject: [gmx-users] gromacs-4.0.2,	parallel performance in two quad core xeon machines 

Hello,

we have built a cluster with nodes that are comprised by the following: dual core Intel(R) Xeon(R) CPU E3110 @ 3.00GHz. The memory of each node has 16Gb of memory. The switch that we use is a dell power connect model. Each node has a Gigabyte ethernet card..

I tested the performance for a system of 7200 atoms in 4cores of one node, in 8 cores of one node and in 16 cores of two nodes. In one node the performance is getting better.
The problem I get is that moving from one node to two, the performance decreases dramatically (almost two days for a run that finishes in less than 3 hours!).

I have compiled gromacs with --enable-mpi option. I also have read previous archives from Mr Kurtzner, yet from what I saw is that they are focused on errors in gromacs 4 or on problems that previous versions of gromacs had. I get no errors, just low
 performance.

Is there any option that I must enable in order to succeed better performance in more than one nodes?  Or do you think according to your experience that the switch we use might be the problem? Or maybe should we have to activate anything from the nodes?

Thank you in advance,
Nikos



Express yourself instantly with MSN Messenger! MSN Messenger 



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090218/4b4ab814/attachment.html>


More information about the gromacs.org_gmx-users mailing list