[gmx-users] Best performace with 0 core for PME calcuation

Mon Jan 12 20:34:35 CET 2009

Carsten Kutzner a écrit :
> On Jan 10, 2009, at 8:32 PM, Nicolas wrote:
>
>> Berk Hess a écrit :
>>> Hi,
>>>
>>> Setting -npme 2 is ridicolous.
>>> mdrun estimates the number of PME nodes by itself when you do not 
>>> specify -npme.
>>> In most cases you need 1/3 or 1/4 of the nodes doing pme.
>>> The default -npme guess of mdrun is usually not bad,
>>> but might need to tuned a bit.
>>> At the end of the md.log file you find the relative PP/PME load
>>> so you can see in which direction you might need to change -npme,
>>> if necessary.
>> Actually, I have tested npme ranging from 0 to 5, but 2 is well 
>> representative of what happens. For example with 5 cores for the PME, 
>> the perfs reach a plateau at 14-15 cores. So, setting npme to 0 
>> systematically gives the best results. I have also tested -1. With, 
>> npme set to -1, the performances are the same than for 0 until 8 
>> cores. Above that, the guess is not so efficient.
>
> Hi Nicolas,
>
> as Berk mentioned, you should expect a different optimal number of PME 
> nodes for
> each number of total nodes you test on. So the way to go is to fix the 
> number of total
> nodes and vary the number of PME nodes until you find the best 
> performance for that
> number of nodes. Then move on to another number of total nodes. I have 
> written
> a small tool that does a part of this job for you by finding out the 
> optimum number
> of PME nodes for a given number of total nodes. If you want to give it 
> a try, I can
> send it to you. Typically the optimum number of PME nodes should not 
> be too far
> off the mdrun estimate. If it is far off, this could point out some 
> network or MPI
> problem. Note that separate PME nodes can only work if the MPI ranks 
> are not scattered
> among the nodes, i.e. on 4-core nodes the ranks 0-3 should be on the 
> same node
> as well as ranks 4-7 and so on. This is printed at the very start of a 
> parallel
> simulation.
>
> Carsten
If you could send me that script, that would be great. I've got a couple 
of clusters to benchmark.

By the way, I probably didn't explain my case well enough. I'm trying to 
benchmark Gromacs 4 as a function of the number of corse AND the number 
of cores dedicated to the PME in a same time. My point was that I didn't 
obtain better performance with npme >= 1 compared to npme=0, whatever 
the cores and PME cores is. So, I was not sure whether the result was 
related to the cluster itself or from a wrong usage of Gromacs.

Nicolas

>
>
>
> -- 
> Dr. Carsten Kutzner
> Max Planck Institute for Biophysical Chemistry
> Theoretical and Computational Biophysics
> Am Fassberg 11
> 37077 Goettingen, Germany
> Tel. +49-551-2012313, Fax: +49-551-2012302
> http://www.mpibpc.mpg.de/home/grubmueller/ihp/ckutzne
>
>
>
>
>
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before 
> posting!
> Please don't post (un)subscribe requests to the list. Use thewww 
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsapay.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090112/0f74f808/attachment.vcf>