[gmx-users] gromacs-4.0.2, parallel performance in two quadcore xeon machines

Antoine FORTUNE Antoine.Fortune at ujf-grenoble.fr
Tue Mar 3 15:32:23 CET 2009

Hi Nikos,

I experienced the same king of things with a core i7 on one node and a
corequad on an second node (Gromacs 4.0.3).
Running on 8 threads (i7) or 4 cores in a single node is 30% faster than
8 or 12 "cores" on 2 nodes. I noticed that my gigabit switch is not
limiting bandwidth using openmpi with rsh (~30 Gbps/70 max).
Running on a single node cpu is 100% used by user (mdruns) while using 2
nodes each cpu is only 50% used by user, the 50% remaining being used by
system. The top command shows 4 mdrun jobs using 100% CPU.
I guess system usage is for network transferts ... Using ssh, system
usage is quite the same and bandwidth is doubled.

Any ideas about that system activity and how to reduce it ?


Berk Hess a écrit :
> Hi,
> Oops, I meant 72000, which is only a factor of 10.
> I guess it might be faster one two nodes then, but probably not 2 times.
> If you use PME you can also experiment with putting all the PME nodes
> on one machine and the non-PME nodes on the other,
> probably with mdrun -ddorder pp_pme
> Gromacs supports near to maxint atoms.
> The question is much more what kind of system size you are
> scientifically interested in.
> Ethernet will never scale very well for small numbers of atoms per core.
> Infiniband will scale very well.
> Berk
> ------------------------------------------------------------------------
> Date: Wed, 18 Feb 2009 12:56:16 -0800
> From: lastexile7gr at yahoo.de
> Subject: RE: [gmx-users] gromacs-4.0.2, parallel performance in two
> quad core xeon machines
> To: gmx-users at gromacs.org
> Hello,
> thank you for your answer. I just wondering though. How am I supposed
> to have a system with more than 99999 atoms, while the gro file has a
> fixed format giving up to 5 digits in the number of atoms? 
> What else should I change in order to succeed better performance from
> my hardware if I can succeed having a much bigger system? You say so
> that ethernet has reached its limits.. 
> I was concidering using a supercomputing center in Europe and as far
> as I know they are using nodes which are using the Cell 9 core
> processors technology in each node. How someone there can accomplish a
> better performance using gromacs 4 using more nodes? Which might be
> the limit there in such machines.  
> Thank you once again,
> Nikos
> --- Berk Hess /<gmx3 at hotmail.com>/ schrieb am *Mi, 18.2.2009:
> *
>     *Von: Berk Hess <gmx3 at hotmail.com>
>     Betreff: RE: [gmx-users] gromacs-4.0.2, parallel performance in
>     two quad core xeon machines
>     An: lastexile7gr at yahoo.de
>     Datum: Mittwoch, 18. Februar 2009, 19:16
>     *
>     * Hi,
>     You can not scale a system of just 7200 atoms
>     to 16 cores which are connected by ethernet.
>     400 atoms per core is already the scaling limit of Gromacs
>     on current hardware with the fastest available network.
>     On ethernet a system 100 times as large might scale well to two nodes.
>     Berk
>     *
>     ------------------------------------------------------------------------
>     *Date: Wed, 18 Feb 2009 09:40:28 -0800
>     From: lastexile7gr at yahoo.de
>     To: gmx-users at gromacs.org
>     Subject: [gmx-users] gromacs-4.0.2, parallel performance in two
>     quad core xeon machines
>     *
>     Hello,
>     we have built a cluster with nodes that are comprised by the
>     following: dual core Intel(R) Xeon(R) CPU E3110 @ 3.00GHz. The
>     memory of each node has 16Gb of memory. The switch that we use is
>     a dell power connect model. Each node has a Gigabyte ethernet card..
>     I tested the performance for a system of 7200 atoms in 4cores of
>     one node, in 8 cores of one node and in 16 cores of two nodes. In
>     one node the performance is getting better.
>     The problem I get is that moving from one node to two, the
>     performance decreases dramatically (almost two days for a run that
>     finishes in less than 3 hours!).
>     I have compiled gromacs with --enable-mpi option. I also have read
>     previous archives from Mr Kurtzner, yet from what I saw is that
>     they are focused on errors in gromacs 4 or on problems that
>     previous versions of gromacs had. I get no errors, just low
>     performance.
>     Is there any option that I must enable in order to succeed better
>     performance in more than one nodes?  Or do you think according to
>     your experience that the switch we use might be the problem? Or
>     maybe should we have to activate anything from the nodes?
>     Thank you in advance,
>     Nikos
>     *
>     *
>     ------------------------------------------------------------------------
>     *Express yourself instantly with MSN Messenger! MSN Messenger
>     <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/> *
> ------------------------------------------------------------------------
> Express yourself instantly with MSN Messenger! MSN Messenger
> <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>
> ------------------------------------------------------------------------
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090303/3d39f70a/attachment.html>

More information about the gromacs.org_gmx-users mailing list