[gmx-users] Best performace with 0 core for PME calcuation
gmx3 at hotmail.com
Mon Jan 12 10:57:10 CET 2009
> From: ckutzne at gwdg.de
> To: gmx-users at gromacs.org
> Subject: Re: [gmx-users] Best performace with 0 core for PME calcuation
> Date: Mon, 12 Jan 2009 10:41:26 +0100
> On Jan 10, 2009, at 8:32 PM, Nicolas wrote:
> > Berk Hess a écrit :
> >> Hi,
> >> Setting -npme 2 is ridicolous.
> >> mdrun estimates the number of PME nodes by itself when you do not
> >> specify -npme.
> >> In most cases you need 1/3 or 1/4 of the nodes doing pme.
> >> The default -npme guess of mdrun is usually not bad,
> >> but might need to tuned a bit.
> >> At the end of the md.log file you find the relative PP/PME load
> >> so you can see in which direction you might need to change -npme,
> >> if necessary.
> > Actually, I have tested npme ranging from 0 to 5, but 2 is well
> > representative of what happens. For example with 5 cores for the
> > PME, the perfs reach a plateau at 14-15 cores. So, setting npme to 0
> > systematically gives the best results. I have also tested -1. With,
> > npme set to -1, the performances are the same than for 0 until 8
> > cores. Above that, the guess is not so efficient.
> Hi Nicolas,
> as Berk mentioned, you should expect a different optimal number of PME
> nodes for
> each number of total nodes you test on. So the way to go is to fix the
> number of total
> nodes and vary the number of PME nodes until you find the best
> performance for that
> number of nodes. Then move on to another number of total nodes. I have
> a small tool that does a part of this job for you by finding out the
> optimum number
> of PME nodes for a given number of total nodes. If you want to give it
> a try, I can
> send it to you. Typically the optimum number of PME nodes should not
> be too far
> off the mdrun estimate. If it is far off, this could point out some
> network or MPI
> problem. Note that separate PME nodes can only work if the MPI ranks
> are not scattered
> among the nodes, i.e. on 4-core nodes the ranks 0-3 should be on the
> same node
> as well as ranks 4-7 and so on. This is printed at the very start of a
"Can only work if" should be rephrased as "Will be most efficient when".
If the MPI ranks are scattered over the nodes you should probably use
In most cases using seprate PME nodes will become more efficient
somewhere between 8 and 12 total nodes.
Express yourself instantly with MSN Messenger! Download today it's FREE!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-users