[gmx-users] Best performace with 0 core for PME calcuation
Nicolas
nsapay at ucalgary.ca
Sat Jan 10 20:42:04 CET 2009
Mark Abraham a écrit :
> Nicolas wrote:
>> Hello,
>>
>> I'm trying to do a benchmark with Gromacs 4 on our cluster, but I
>> don't completely understand the results I obtain. The system I used
>> is a 128 DOPC bilayer hydrated by ~18800 SPC for a total of ~70200
>> atoms. The size of the system is 9.6x9.6x10.1 nm^3. I'm using the
>> following parameters:
>>
>> * nstlist = 10
>> * rlist = 1
>> * Coulombtype = PME
>> * rcoulomb = 1
>> * fourier spacing = 0.12
>> * vdwtype = Cutoff
>> * rvdw = 1
>>
>> The cluster itself has got 2 procs/node connected by Ethernet 100 MB/s.
>
> Ethernet and Gigabit ethernet are not fast enough to get reasonable
> scaling. There've been quite a few posts on this topic in the last six
> months.
>
> Hmm I see you've corrected your post to refer to Infiniband with four
> cores/node. That should be reasonable, I understand (but search the
> archive).
>
> You should also check that your benchmark calculation is long enough
> that you are measuring a simulation time that isn't being dominated by
> setup costs. Some years ago a non-MD sysadmin complained of poor
> scaling when he was testing over 10 or so MD steps!
My computation are lasting at least 10 min (20000 steps). I think it's
enough. By the way, could the message passing interface can
significantly influence the performance? I'm using MPICH-1.2. Should I
consider using LAM or MPICH2?
Nicolas
>
>> I'm using mpiexec to run Gromacs. When I use -npme 2 -ddorder
>> interleave, I get:
>> ncore Perf (ns/day) PME (%)
>>
>> 1 0,00 0
>> 2 0,00 0
>> 3 0,00 0
>> 4 1,35 28
>> 5 1,84 31
>> 6 2,08 27
>> 8 2,09 21
>> 10 2,25 17
>> 12 2,02 15
>> 14 2,20 13
>> 16 2,04 11
>> 18 2,18 10
>> 20 2,29 9
>>
>> So, above 6-8 cores, the PP nodes are spending too much time waiting
>> for the PME nodes and the perf forms a plateau.
>
> That's not surprising - the heuristic is that about a third to a
> quarter of the cores want to be PME-only nodes. Of course, that
> depends on the relative size of the real- and reciprocal-space parts
> of the calculation.
>
>> When I use -npme 0, I get:
>>
>> ncore Perf (ns/day) PME (%)
>> 1 0,43 33
>> 2 0,92 34
>> 3 1,34 35
>> 4 1,69 36
>> 5 2,17 33
>> 6 2,56 32
>> 8 3,24 33
>> 10 3,84 34
>> 12 4,34 35
>> 14 5,05 32
>> 16 5,47 34
>> 18 5,54 37
>> 20 6,13 36
>>
>> I obtain much better performances when there is no PME nodes, while I
>> was expecting the opposite. Does someone have an explanation for
>> that? Does that means domain decomposition is useless below a certain
>> real space cutoff? I'm quite confused.
>
> The relevant observations are for 4,5,6 and 8, for which shared-duty
> is out-performing -npme 2. I think your observations support the
> conclusion that your network hardware is more limiting for PME-only
> nodes than shared-duty nodes. They don't support the conclusion that
> DD is useless, since you haven't compared with PD.
>
> You can play with the PME parameters to shift more load into the
> real-space part - IIRC Carsten suggested a heuristic a few months back.
>
> Mark
> _______________________________________________
> gmx-users mailing list gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before
> posting!
> Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsapay.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090110/b4469ce6/attachment.vcf>
More information about the gromacs.org_gmx-users
mailing list