[gmx-users] Best performace with 0 core for PME calcuation
Berk Hess
gmx3 at hotmail.com
Mon Jan 12 10:57:10 CET 2009
> From: ckutzne at gwdg.de
> To: gmx-users at gromacs.org
> Subject: Re: [gmx-users] Best performace with 0 core for PME calcuation
> Date: Mon, 12 Jan 2009 10:41:26 +0100
>
> On Jan 10, 2009, at 8:32 PM, Nicolas wrote:
>
> > Berk Hess a écrit :
> >> Hi,
> >>
> >> Setting -npme 2 is ridicolous.
> >> mdrun estimates the number of PME nodes by itself when you do not
> >> specify -npme.
> >> In most cases you need 1/3 or 1/4 of the nodes doing pme.
> >> The default -npme guess of mdrun is usually not bad,
> >> but might need to tuned a bit.
> >> At the end of the md.log file you find the relative PP/PME load
> >> so you can see in which direction you might need to change -npme,
> >> if necessary.
> > Actually, I have tested npme ranging from 0 to 5, but 2 is well
> > representative of what happens. For example with 5 cores for the
> > PME, the perfs reach a plateau at 14-15 cores. So, setting npme to 0
> > systematically gives the best results. I have also tested -1. With,
> > npme set to -1, the performances are the same than for 0 until 8
> > cores. Above that, the guess is not so efficient.
>
> Hi Nicolas,
>
> as Berk mentioned, you should expect a different optimal number of PME
> nodes for
> each number of total nodes you test on. So the way to go is to fix the
> number of total
> nodes and vary the number of PME nodes until you find the best
> performance for that
> number of nodes. Then move on to another number of total nodes. I have
> written
> a small tool that does a part of this job for you by finding out the
> optimum number
> of PME nodes for a given number of total nodes. If you want to give it
> a try, I can
> send it to you. Typically the optimum number of PME nodes should not
> be too far
> off the mdrun estimate. If it is far off, this could point out some
> network or MPI
> problem. Note that separate PME nodes can only work if the MPI ranks
> are not scattered
> among the nodes, i.e. on 4-core nodes the ranks 0-3 should be on the
> same node
> as well as ranks 4-7 and so on. This is printed at the very start of a
> parallel
> simulation.
>
> Carsten
"Can only work if" should be rephrased as "Will be most efficient when".
If the MPI ranks are scattered over the nodes you should probably use
-ddorder pp_pme.
In most cases using seprate PME nodes will become more efficient
somewhere between 8 and 12 total nodes.
Berk
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090112/7ad5b2e2/attachment.html>
More information about the gromacs.org_gmx-users
mailing list