[gmx-users] Best performace with 0 core for PME calcuation

Mon Jan 12 10:57:10 CET 2009

> From: ckutzne at gwdg.de
> To: gmx-users at gromacs.org
> Subject: Re: [gmx-users] Best performace with 0 core for PME calcuation
> Date: Mon, 12 Jan 2009 10:41:26 +0100
> 
> On Jan 10, 2009, at 8:32 PM, Nicolas wrote:
> 
> > Berk Hess a écrit :
> >> Hi,
> >>
> >> Setting -npme 2 is ridicolous.
> >> mdrun estimates the number of PME nodes by itself when you do not  
> >> specify -npme.
> >> In most cases you need 1/3 or 1/4 of the nodes doing pme.
> >> The default -npme guess of mdrun is usually not bad,
> >> but might need to tuned a bit.
> >> At the end of the md.log file you find the relative PP/PME load
> >> so you can see in which direction you might need to change -npme,
> >> if necessary.
> > Actually, I have tested npme ranging from 0 to 5, but 2 is well  
> > representative of what happens. For example with 5 cores for the  
> > PME, the perfs reach a plateau at 14-15 cores. So, setting npme to 0  
> > systematically gives the best results. I have also tested -1. With,  
> > npme set to -1, the performances are the same than for 0 until 8  
> > cores. Above that, the guess is not so efficient.
> 
> Hi Nicolas,
> 
> as Berk mentioned, you should expect a different optimal number of PME  
> nodes for
> each number of total nodes you test on. So the way to go is to fix the  
> number of total
> nodes and vary the number of PME nodes until you find the best  
> performance for that
> number of nodes. Then move on to another number of total nodes. I have  
> written
> a small tool that does a part of this job for you by finding out the  
> optimum number
> of PME nodes for a given number of total nodes. If you want to give it  
> a try, I can
> send it to you. Typically the optimum number of PME nodes should not  
> be too far
> off the mdrun estimate. If it is far off, this could point out some  
> network or MPI
> problem. Note that separate PME nodes can only work if the MPI ranks  
> are not scattered
> among the nodes, i.e. on 4-core nodes the ranks 0-3 should be on the  
> same node
> as well as ranks 4-7 and so on. This is printed at the very start of a  
> parallel
> simulation.
> 
> Carsten

"Can only work if" should be rephrased as "Will be most efficient when".
If the MPI ranks are  scattered over the nodes you should probably use
-ddorder pp_pme.

In most cases using seprate PME nodes will become more efficient
somewhere between 8 and 12 total nodes.

Berk

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090112/7ad5b2e2/attachment.html>