[gmx-users] PME grid parameters in large system run in parallel in Gromacs 4 CVS
Berk Hess
gmx3 at hotmail.com
Wed Feb 27 12:09:11 CET 2008
----------------------------------------
> From: larsson at xray.bmc.uu.se
> Date: Wed, 27 Feb 2008 11:48:30 +0100
> To: gmx-users at gromacs.org
> Subject: [gmx-users] PME grid parameters in large system run in parallel in Gromacs 4 CVS
>
> Ok, this will be a long one.
>
> Currently I am simulating large systems (20-35 nm sides; rhombic
> dodecahedron or rectangular boxes) in the latest CVS version of
> Gromacs on 70-500 nodes using PME to calculate long range
> electrostatic interactions. Now I am looking for some advice
> regarding how to set up my system for good performance.
>
> There is a correlation between the short range cutoff (rlist/rcoulomb/
> rvdw), PME grid spacing (fourierspacing, fourier_nx/fourier_ny/
> fourier_nz) and the PME interpolation order (pme_order). Grompp tells
> me that for best performance there should be about 25-33% load on the
> dedicated PME nodes compared to the regular particle-particle nodes.
> I guess this value is due to the communication overhead of the PME
> nodes.
No, this is because of the balance of the computational cost.
>
> Two of my setups are:
>
> SYSTEM 1
> Dimentions: 24.12032 24.12032 17.05213 0.00000 0.00000
> 0.00000 0.00000 12.06016 12.06016
> Fourier grid spacing: 0.137 x 0.137 x 0.137
> Cutoffs: 1.1
> Interpolation order: 4
> Estimated PME load: 0.28
>
> SYSTEM 2
> Dimentions: 23.53603 24.05188 35.23882
> Fourier grid spacing: 0.134 0.137 0.133
> Cutoffs: 1.1
> Interpolation order: 4
> Estimated PME load: 0.51
>
> So I am interested to hear from you how far I can push the fourier
> grid spacing before I loose accuracy in terms of force and energy.
> How much must I compensate with increased interpolation and increased
> cutoff? Cutoff will put more load on the PP nodes which is what I
> want in this case, but would an increased interpolation order also be
> required/advantageous? I use the fft optimization option.
>
> In the original paper on PME (Darden et al. 1993) there is a
> comparison of PME order 1-4 and grid size of 0.1-0.05 nm which shows
> that a grid spacing of 0.075 nm and an interpolation order of 3 gives
> an rms force error of 2x10^-4, which they say is reasonable. Does
> anyone of you know of a more recent equivalent test on larger systems?
>
The answer to this is really simple.
I tested it, and the accuracy does not change when you scale
the cut-off and the grid spacing by the same factor.
>
> A somewhat unrelated question is about the heuristics of mdrun when
> it decides how many PME nodes versus PP nodes it should use and when
> it does the PP domain decomposition. It would be quite useful with a
> tool that could suggest a desirable number of nodes within a given
> interval. If I can understand in which order the decisions are made
> and which constrains are imposed I will gladly write such a tool
> myself. (More than once has my simulations failed after a long time
> in the queue due to an incompatible number of nodes. A feature of
> mdrun would be to choose not to use x number of nodes if it would
> improve performance rather than just die.)
When you are running over large numbers of nodes and using a large amount
of computational resources, it is better (and worth the effort) to set the number
of PME only nodes by hand.
It is desirable to use a number of particle only nodes that is a multiple
of the number of PME only nodes. So you choose the number of PME nodes,
which should be compatible with the grid spacing, and then you add
a number of PP nodes which is 3x or 4x the PME nodes, depending on
what grompp has told you, or better, what you determined from a (very short)
benchmark run.
Note that version 4 is still not out, and the documentation is not complete.
I am open for suggestions on how to improve the automated setup of mdrun.
But I don't know if it would be better to automatically use an inefficient setup
(such as just not using some cpus) instead of producing a fatal error.
For small numbers of cpus things should nearly always work, while for large
numbers of cpus the user should do a bit of work to get the optimal performance
out of the large computational resources.
Berk.
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
More information about the gromacs.org_gmx-users
mailing list