[gmx-developers] Water innerloop

Fri Mar 3 04:18:25 CET 2006

Erik Lindahl wrote:
> Hi,
> 
>>
>> Presumably the nonbonded list generation will take care of itself.
>>
> 
> Currently the detection of 'water' molecules is performed when  setting 
> up the forcerecord. If you just modify that you can have any  type of 
> molecule end up the water or water-water lists.
> 
>> Otherwise, modifying the general-case algorithms (for various $x,  $y, 
>> $z) in gmxlib/nonbonded/nb_kernel_$x/nb_kernel$y$z0_$x.s along  the 
>> lines of the existing specialisations nb_kernel$y$z[1-4]_$x.s  for the 
>> specific case of TIP3P in CHARMM is the way forward.  Obviously the 
>> mechanism that chooses which nb_kernel function to  call would need to 
>> be expanded.
>>
>> Would that be all that's necessary?
> 
> 
> Yes, in principle. However, if you only do it in C or Fortran the non- 
> water Gromacs assembly loops will probably still be faster.
> 
> You won't gain quite as much as for normal TIP3P/SPC/TIP4P either,  
> since you're performing more calculations per coordinate load/force  store.

OK, since I want TIP3P with CHARMM in GROMACS, I've gone ahead and 
implemented C versions suitable for CHARMM_TIP3P, and as Erik predicted, 
the general assembly language routines are faster.

That required a fair bit of augmenting to existing machinery. I needed 
to define kernel routine water types 5 and 6 for TIP3P_CHARMM - other 
atom and TIP3P_CHARMM - TIP3P_CHARMM, which required mucking with a 
bunch of enums and struct initializations, altering the algorithm that 
recognizes the water types to recognize this new one, making mknb 
generate TIP3P_CHARMM kernel routines, and probably some other stuff 
that escapes me now. I now plan to add new assembly routines, since I 
can now see & understand the differences between nb_kernelxx[0-2] and 
nb_kernel[5-6] at C level.

Now, some timing results for a 21459-atom system with 7031 waters, one 
peptide and 11 ions on a Pentium 4 using PME. All numbers are CPU hours 
per simulation ns, extrapolated from from 100 0.002fs MD steps.

For C routines, using TIP3P_CHARMM with no water optimization, I got 
67.264, compared with 58.875 after TIP3P_CHARMM optimization. The same 
system with the hydrogen vdW parameters zeroed out in the .itp file 
before using grompp uses the expected TIP3P optimization and runs at 
49.278. Obviously I got numerical agreement between the first two of 
these, and not the third!

For assembly routines, using TIP3P_CHARMM with no water optimization, I 
got 30.250 which is about twice as fast as the optimized TIP3P_CHARMM C 
routine. After zeroing the hydrogen vdW, using TIP3P optimized assembly 
routines runs at 19.57. Thus, if I do it right, I expect TIP3P_CHARMM 
assembly routines to run around the mid-20s. This would be around a 
one-sixth saving over the 30.250, so I'm going to spend some more time 
doing that in the next few days.

I'm happy to share this stuff when I know it's working right.

Mark