[gmx-developers] Optimization of PME parameters

Sun Oct 26 08:20:30 CET 2008

Hi,

Short version:

I'm hacking 3.3.1 mdrun to loop over PME parameter space to facilitate 
finding points that are optimal for both error in the approximation and 
cost. The PME initialization code was only meant to be called once, so 
I've worked around and written clean-up code to free memory and throw 
away old FFT plans. I can see that on the second iterate of that PME 
initialization that things run correctly (except that I can't inspect 
the actual FFT plans), but the first FFT call produces results that 
disagree with those produced by a run that simply starts at the second 
grid point. Thus I think the plans being used are different. This 
problem doesn't arise in serial, or when only scanning rcoulomb (while 
holding Ewald beta fixed). Restating, when changing fourier_n[xyz] for a 
second PME initialization in a parallel run with everything else 
constant, either a wrong FFT plan is generated or some spurious one used 
subsequently. I'm wondering what to do next.

1) Does anybody have any ideas about how to ensure my PME "state" is fresh?

2) Is it worth my while compiling a debug-enabled FFTW library so I can 
inspect the FFT plans to see if they differ?

3) I can work around this by constraining the user to do only single 
grid points for reciprocal-space "scans". Since the whole thing will be 
controlled by shell scripts, the only actual loss from moving the 
looping out to the script level is having to read up multiple instances 
of identical run input files. That would be an acceptable solution.

4) I can see the PME code is much better-behaved in GROMACS 4.0. Does 
this suggest I finish my port to that version and hope things magically 
get better?

Thanks very much for any input!

Mark

----------------------

Long version:

I am part-way through a project to develop a package of software 
intended to allow users to optimize PME parameters in GROMACS. That is, 
to determine the combination of rcoulomb, pme_order, fourier_n[xyz], and 
(in GROMACS 4) the number of PME-only nodes that gives a known RMS error 
in the electrostatic component of the force for lowest compute time. 
This strikes me as a useful tool to have available to the community, as 
the optimal point can vary with the number of atoms in the system, 
forcefield, machine architecture, number of CPUs, type of interconnect, 
and probably the number of goats sacrificed three full moons ago!

My idea is for the user to run some simulation to generate a sample 
trajectory for a series of mdrun -rerun jobs that do a grid scan over 
PME parameter space measuring the RMS error in the direct- and 
reciprocal-space forces by comparison with another rerun that uses PME 
parameters that converge those force components to (or near) machine 
precision. (This idea comes from the discussion of accuracy in Essman, 
et al. Journal of Chemical Physics, 103(19):8577). Having found points 
in parameter space that achieve the desired level of approximation, the 
user then applies those parameter sets to some test simulations to find 
the set that gives the fastest simulations.

This approach requires that one be able to calculate the RMS difference 
in the components of the forces between a "normal" run and one that is 
quite resource-intensive. It seems best to do the comparison "off-line", 
which means one would like an efficient way to dump the full-precision 
real- and reciprocal-space force components to disk while doing a full 
rerun, then possibly modify the PME parameters to go to the next point 
in the grid scan, and start the rerun again. Then later a custom 
analysis tool can do the RMS comparisons on the components and construct 
the error results.

I implemented this in 3.3.1 by using a flag to mdrun to select either a 
scan of the real- or reciprocal-space parameters. In md.c the contents 
of userreal[123] or userint[1234] are then used to define extrema for 
the grid scan. For the real-space scan, the Ewald splitting parameter is 
defined by the normal rcoulomb in the run input file, but everything 
else is done using the rcoulomb value for the grid point. Then one loops 
over most of the contents of the mdrunner() function, tweaking things to 
just evaluate the electrostatic forces. This approach works great over 
both real- and reciprocal-space scans in serial, and real-space in parallel.

However I am having trouble with the reciprocal-space scan in parallel. 
The first point of the grid scan (i.e. a full trajectory rerun) 
completes successfully, but the second and future points of the scan 
produce garbage. The PME code is not very well-behaved - there's global 
data held in static variables, and no formal cleanup procedure since it 
would normally only be initialized once. I developed work-arounds for 
all of that to allow for re-initialization. I can see that 
gmx_parallel_3dfft_init for the second rerun executes correctly. Except 
that I can't inspect the FFT plans in my debugger, it has identical 
results with this function called first in a grid scan that starts at 
that second grid point. So the FFT plans *ought* to be set up correctly. 
The charge-spreading to the grid in do_pme is identical (easily checked 
in the -debug file using -DDEBUG on pme.c). However the immediately 
subsequent call gmxfft3D produces wildly different values on the grid 
points for the two runs I'm comparing in the debugger. Stepping inside 
that, I can see that the difference arises during the actual FFT call.

This doesn't make any sense, as I'm calling gmx_fft_destroy at the end 
of the first grid point to throw away the plans generated in the first 
invocation of gmx_parallel_3dfft_init and free the memory. The FFTs 
should be being generated afresh.