[gmx-developers] Water innerloop
Mark.Abraham at anu.edu.au
Fri Mar 3 04:18:25 CET 2006
Erik Lindahl wrote:
>> Presumably the nonbonded list generation will take care of itself.
> Currently the detection of 'water' molecules is performed when setting
> up the forcerecord. If you just modify that you can have any type of
> molecule end up the water or water-water lists.
>> Otherwise, modifying the general-case algorithms (for various $x, $y,
>> $z) in gmxlib/nonbonded/nb_kernel_$x/nb_kernel$y$z0_$x.s along the
>> lines of the existing specialisations nb_kernel$y$z[1-4]_$x.s for the
>> specific case of TIP3P in CHARMM is the way forward. Obviously the
>> mechanism that chooses which nb_kernel function to call would need to
>> be expanded.
>> Would that be all that's necessary?
> Yes, in principle. However, if you only do it in C or Fortran the non-
> water Gromacs assembly loops will probably still be faster.
> You won't gain quite as much as for normal TIP3P/SPC/TIP4P either,
> since you're performing more calculations per coordinate load/force store.
OK, since I want TIP3P with CHARMM in GROMACS, I've gone ahead and
implemented C versions suitable for CHARMM_TIP3P, and as Erik predicted,
the general assembly language routines are faster.
That required a fair bit of augmenting to existing machinery. I needed
to define kernel routine water types 5 and 6 for TIP3P_CHARMM - other
atom and TIP3P_CHARMM - TIP3P_CHARMM, which required mucking with a
bunch of enums and struct initializations, altering the algorithm that
recognizes the water types to recognize this new one, making mknb
generate TIP3P_CHARMM kernel routines, and probably some other stuff
that escapes me now. I now plan to add new assembly routines, since I
can now see & understand the differences between nb_kernelxx[0-2] and
nb_kernel[5-6] at C level.
Now, some timing results for a 21459-atom system with 7031 waters, one
peptide and 11 ions on a Pentium 4 using PME. All numbers are CPU hours
per simulation ns, extrapolated from from 100 0.002fs MD steps.
For C routines, using TIP3P_CHARMM with no water optimization, I got
67.264, compared with 58.875 after TIP3P_CHARMM optimization. The same
system with the hydrogen vdW parameters zeroed out in the .itp file
before using grompp uses the expected TIP3P optimization and runs at
49.278. Obviously I got numerical agreement between the first two of
these, and not the third!
For assembly routines, using TIP3P_CHARMM with no water optimization, I
got 30.250 which is about twice as fast as the optimized TIP3P_CHARMM C
routine. After zeroing the hydrogen vdW, using TIP3P optimized assembly
routines runs at 19.57. Thus, if I do it right, I expect TIP3P_CHARMM
assembly routines to run around the mid-20s. This would be around a
one-sixth saving over the 30.250, so I'm going to spend some more time
doing that in the next few days.
I'm happy to share this stuff when I know it's working right.
More information about the gromacs.org_gmx-developers