[gmx-developers] Optimization of PME parameters
Mark Abraham
Mark.Abraham at anu.edu.au
Sun Oct 26 08:20:30 CET 2008
Hi,
Short version:
I'm hacking 3.3.1 mdrun to loop over PME parameter space to facilitate
finding points that are optimal for both error in the approximation and
cost. The PME initialization code was only meant to be called once, so
I've worked around and written clean-up code to free memory and throw
away old FFT plans. I can see that on the second iterate of that PME
initialization that things run correctly (except that I can't inspect
the actual FFT plans), but the first FFT call produces results that
disagree with those produced by a run that simply starts at the second
grid point. Thus I think the plans being used are different. This
problem doesn't arise in serial, or when only scanning rcoulomb (while
holding Ewald beta fixed). Restating, when changing fourier_n[xyz] for a
second PME initialization in a parallel run with everything else
constant, either a wrong FFT plan is generated or some spurious one used
subsequently. I'm wondering what to do next.
1) Does anybody have any ideas about how to ensure my PME "state" is fresh?
2) Is it worth my while compiling a debug-enabled FFTW library so I can
inspect the FFT plans to see if they differ?
3) I can work around this by constraining the user to do only single
grid points for reciprocal-space "scans". Since the whole thing will be
controlled by shell scripts, the only actual loss from moving the
looping out to the script level is having to read up multiple instances
of identical run input files. That would be an acceptable solution.
4) I can see the PME code is much better-behaved in GROMACS 4.0. Does
this suggest I finish my port to that version and hope things magically
get better?
Thanks very much for any input!
Mark
----------------------
Long version:
I am part-way through a project to develop a package of software
intended to allow users to optimize PME parameters in GROMACS. That is,
to determine the combination of rcoulomb, pme_order, fourier_n[xyz], and
(in GROMACS 4) the number of PME-only nodes that gives a known RMS error
in the electrostatic component of the force for lowest compute time.
This strikes me as a useful tool to have available to the community, as
the optimal point can vary with the number of atoms in the system,
forcefield, machine architecture, number of CPUs, type of interconnect,
and probably the number of goats sacrificed three full moons ago!
My idea is for the user to run some simulation to generate a sample
trajectory for a series of mdrun -rerun jobs that do a grid scan over
PME parameter space measuring the RMS error in the direct- and
reciprocal-space forces by comparison with another rerun that uses PME
parameters that converge those force components to (or near) machine
precision. (This idea comes from the discussion of accuracy in Essman,
et al. Journal of Chemical Physics, 103(19):8577). Having found points
in parameter space that achieve the desired level of approximation, the
user then applies those parameter sets to some test simulations to find
the set that gives the fastest simulations.
This approach requires that one be able to calculate the RMS difference
in the components of the forces between a "normal" run and one that is
quite resource-intensive. It seems best to do the comparison "off-line",
which means one would like an efficient way to dump the full-precision
real- and reciprocal-space force components to disk while doing a full
rerun, then possibly modify the PME parameters to go to the next point
in the grid scan, and start the rerun again. Then later a custom
analysis tool can do the RMS comparisons on the components and construct
the error results.
I implemented this in 3.3.1 by using a flag to mdrun to select either a
scan of the real- or reciprocal-space parameters. In md.c the contents
of userreal[123] or userint[1234] are then used to define extrema for
the grid scan. For the real-space scan, the Ewald splitting parameter is
defined by the normal rcoulomb in the run input file, but everything
else is done using the rcoulomb value for the grid point. Then one loops
over most of the contents of the mdrunner() function, tweaking things to
just evaluate the electrostatic forces. This approach works great over
both real- and reciprocal-space scans in serial, and real-space in parallel.
However I am having trouble with the reciprocal-space scan in parallel.
The first point of the grid scan (i.e. a full trajectory rerun)
completes successfully, but the second and future points of the scan
produce garbage. The PME code is not very well-behaved - there's global
data held in static variables, and no formal cleanup procedure since it
would normally only be initialized once. I developed work-arounds for
all of that to allow for re-initialization. I can see that
gmx_parallel_3dfft_init for the second rerun executes correctly. Except
that I can't inspect the FFT plans in my debugger, it has identical
results with this function called first in a grid scan that starts at
that second grid point. So the FFT plans *ought* to be set up correctly.
The charge-spreading to the grid in do_pme is identical (easily checked
in the -debug file using -DDEBUG on pme.c). However the immediately
subsequent call gmxfft3D produces wildly different values on the grid
points for the two runs I'm comparing in the debugger. Stepping inside
that, I can see that the difference arises during the actual FFT call.
This doesn't make any sense, as I'm calling gmx_fft_destroy at the end
of the first grid point to throw away the plans generated in the first
invocation of gmx_parallel_3dfft_init and free the memory. The FFTs
should be being generated afresh.
More information about the gromacs.org_gmx-developers
mailing list