[gmx-developers] max cpu scale-up
hessb at mpip-mainz.mpg.de
hessb at mpip-mainz.mpg.de
Fri Sep 19 19:31:09 CEST 2008
> in fact I tried 128 cpu with just -s -v and I get this error:
> Fatal error:
> Could not find an appropriate number of separate PME nodes. i.e. >=
> 0.419288*#nodes (51) and <= #n
> odes/2 (64) and reasonable performance wise (grid_x=144, grid_y=120).
> Use the -npme option of mdrun or change the number of processors or the
> grid dimensions.
> Using -npme 64 it works and on 128 cpu it goes 2ns (dt=2fs) in 3h45'!
I don't know your other timings, so I don't know how much better this is.
But 0.42 as relative PME load is quite high.
Something between 0.25 and 0.33 is better.
grompp prints this load.
It is probably better (faster) to slightly increase you cut-off
and your PME grid spacing with the same factor.
> What -ddorder cartesian does?
This determines the order of the PP and PME nodes.
If the XT4 would really support cartesian communication this would
be a lot faster, but unfortunately it does not.
Still it will probably be faster when all the PME nodes are physically
close toghether. -ddorder cartesian will do this.
-ddorder pp_pme will also work well, but I expect cartesian
to be slightly faster, you can try both.
> Anyway, really good job!
> Andrea Spitaleri PhD
> Dulbecco Telethon Institute
> c/o DIBIT Scientific Institute
> Biomolecular NMR, 1B4
> Via Olgettina 58
> 20132 Milano (Italy)
> Tel: 0039-0226434348/5622/3497/4922
> Fax: 0039-0226434153
> ----- Original Message -----
> From: Berk Hess <hessb at mpip-mainz.mpg.de>
> Date: Friday, September 19, 2008 5:03 pm
> Subject: Re: [gmx-developers] max cpu scale-up
>> Just try!
>> This depends a lot on your system.
>> But with 200000 atoms you should be able to scale much further on a
>> You usually don't need to use -dd option, mdrun will optimize it
>> for you.
>> If you use pme, you will get far higher performance with separate
>> pme nodes,
>> mdrun will automatically select this (when you do not specify -dd).
>> You might need to optimize -npme.
>> On a xt4 you probably want to use -ddorder cartesian
>> No other options should be needed.
>> The only real disadvantage of an xt4 is that you can not get/select
>> processor blocks.
>> Therefore 4x4x4 might fit perfectly in a torus of the machine, but
>> with never give you such a nice assignment.
>> andrea spitaleri wrote:
>> > Hi there,
>> > I am using the latest cvs on hector cluster
>> > and normally I set the simulation to 64 cpus and -dd 4 4 4. This
>> > to me a good performance for a simulation with ca. 200,000 atoms
>> > (6ns/day with dt=2fs).
>> > How many cpus the new gromacs may scale-up? do you have any
>> > parameters? Do you think that 128 cpus can perform better in speed?
>> > thanks in advance
>> > regards
>> > andrea
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
> gmx-developers mailing list
> gmx-developers at gromacs.org
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.This email
> was Anti Virus checked by Astaro Security Gateway. http://www.astaro.com
More information about the gromacs.org_gmx-developers