[gmx-developers] Re: GROMACS 4.6 pre-release: new algorithms, parallelization schemes, & GPU acceleration

Tue Mar 6 16:18:42 CET 2012

Am 06.03.2012 14:34, schrieb Szilárd Páll:
> On Mon, Mar 5, 2012 at 11:19 PM, Mirco Wahab
>> OK, lets see. I removed (from mdp) completely
>>     vdw_type
>>     coulombtype
>
> You didn't need to remove those.

Yes, I did put them back to use RF, which Marrink
also considered and tested in at least one paper (the
"antimicrobial peptides membrane channel" paper
from Rzepiela). But electrostatics seems to be
of marginal effect in most CG representations.

>> and *VOILA*, I can't believe this, it *does* work
>> - Win7/x64
>> - Cuda4.1
>> - VS2010/SP1
>> - fftw3f
>> - gsl/gslcblas
>> - libxml2
>
> Why would it not work :)

I find it remarkable that the Gromacs source code
still retains a compatibility with Visual
Studio Compilers. There are almost no unix'ish process
management functions involved which would make
porting impossible. To be able to provide a fully
working Gromacs installation on any students desktop
computer is a nice thing imho.

>> A first test set of ~ 1 million particles (most MARTINI W,
>> ~10,000 lipids) will run on a i7/920/(@3GHz) at about 11.5 GFlops
>> with "*-nt 8*" (15.9 ns/day). On a stock GT-550 in the same
>> machine (*-nt 1 -nb gpu*), it reaches  14.4 GFlops (23.5 ns/day).
>
> A few things that can improve performance:
> - increase nstcalcenergy, e.g. to 100;
> - try increasing nstlist, 20-30 is generally the best.

Test: rlist = rvdw = rcoulomb = 1.0
(vdw_type=Cut-off,coulombtype=Reaction-Field)

1) nstlist = 20
Determining Verlet buffer for an energy drift of 0.001 kJ/mol/ps at 320K
Set rlist to 1.52 nm, buffer size 0.52 nm

  --> ~23.7 ns/d

2) nstlist = 10
Determining Verlet buffer for an energy drift of 0.001 kJ/mol/ps at 320K
Set rlist to 1.21 nm, buffer size 0.21 nm

-> ~ 27.4 ns/d

3) nstlist = 5
Determining Verlet buffer for an energy drift of 0.001 kJ/mol/ps at 320K
Set rlist to 1.08 nm, buffer size 0.08 nm

-> ~ 26.1 ns/d

In the new "cutoff-scheme = verlet", there seems to
be an automatic configuration for the rlist value
that can only be indirectly dealt with?

These values probably don't mean very much, the GTX-550 used is
a rather slow card. Are there any architectural options in the
cards the gpu routines are optimized for? Maybe number of
SM (4 in the GTX 550 and 16 in the 580 or other GF-110 vs GF-116
differences). I have a GTX-580 in another Linux box, maybe I'll
pull it out for testing.

Thanks & regards

M.