[gmx-users] Best performace with 0 core for PME calcuation
nsapay at ucalgary.ca
Mon Jan 12 20:34:35 CET 2009
Carsten Kutzner a écrit :
> On Jan 10, 2009, at 8:32 PM, Nicolas wrote:
>> Berk Hess a écrit :
>>> Setting -npme 2 is ridicolous.
>>> mdrun estimates the number of PME nodes by itself when you do not
>>> specify -npme.
>>> In most cases you need 1/3 or 1/4 of the nodes doing pme.
>>> The default -npme guess of mdrun is usually not bad,
>>> but might need to tuned a bit.
>>> At the end of the md.log file you find the relative PP/PME load
>>> so you can see in which direction you might need to change -npme,
>>> if necessary.
>> Actually, I have tested npme ranging from 0 to 5, but 2 is well
>> representative of what happens. For example with 5 cores for the PME,
>> the perfs reach a plateau at 14-15 cores. So, setting npme to 0
>> systematically gives the best results. I have also tested -1. With,
>> npme set to -1, the performances are the same than for 0 until 8
>> cores. Above that, the guess is not so efficient.
> Hi Nicolas,
> as Berk mentioned, you should expect a different optimal number of PME
> nodes for
> each number of total nodes you test on. So the way to go is to fix the
> number of total
> nodes and vary the number of PME nodes until you find the best
> performance for that
> number of nodes. Then move on to another number of total nodes. I have
> a small tool that does a part of this job for you by finding out the
> optimum number
> of PME nodes for a given number of total nodes. If you want to give it
> a try, I can
> send it to you. Typically the optimum number of PME nodes should not
> be too far
> off the mdrun estimate. If it is far off, this could point out some
> network or MPI
> problem. Note that separate PME nodes can only work if the MPI ranks
> are not scattered
> among the nodes, i.e. on 4-core nodes the ranks 0-3 should be on the
> same node
> as well as ranks 4-7 and so on. This is printed at the very start of a
If you could send me that script, that would be great. I've got a couple
of clusters to benchmark.
By the way, I probably didn't explain my case well enough. I'm trying to
benchmark Gromacs 4 as a function of the number of corse AND the number
of cores dedicated to the PME in a same time. My point was that I didn't
obtain better performance with npme >= 1 compared to npme=0, whatever
the cores and PME cores is. So, I was not sure whether the result was
related to the cluster itself or from a wrong usage of Gromacs.
> Dr. Carsten Kutzner
> Max Planck Institute for Biophysical Chemistry
> Theoretical and Computational Biophysics
> Am Fassberg 11
> 37077 Goettingen, Germany
> Tel. +49-551-2012313, Fax: +49-551-2012302
> gmx-users mailing list gmx-users at gromacs.org
> Please search the archive at http://www.gromacs.org/search before
> Please don't post (un)subscribe requests to the list. Use thewww
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 310 bytes
Desc: not available
More information about the gromacs.org_gmx-users