[gmx-developers] free energies on GPUs?

Igor Leontyev ileontyev at ucdavis.edu
Thu Feb 23 10:37:42 CET 2017

Berk and Mark,
Thanks for your comments. There was no multiple runs issue, because only 
single job was running. (BTW, I thought with default "-pinstride 0" 
mdrun minimizes the number of threads per physical core, isn't it?)

Regarding pme-order=6 it was my confusion. Higher pme accuracy was not 
needed. The manual said: "You might try 6/8/10 when running in parallel 
and simultaneously decrease grid dimension." I thought increasing 
pme-order should unload cpu while it works completely opposite. The use 
of pme-order=4 gave 60% better performance resulting in 100% speedup on 
GPU (vs 50% with pme-order=6).

There might be still some compiler/optimization issues. Surprisingly, my 
mdrun binary compiled vs fftw-3.3.4 (with AVX optimization) is 20% 
faster than that compiled vs fftw-3.3.5 and fftw-3.3.6 with AVX2.

> Message: 1
> Date: Thu, 23 Feb 2017 01:52:40 +0100
> From: Berk Hess <hess at kth.se>
> To: gmx-developers at gromacs.org
> Subject: Re: [gmx-developers] free energies on GPUs?
> Message-ID: <0133de93-dd40-b89f-b960-e25ace1d3cec at kth.se>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> I don't see anything strange, apart from the multiple run issue Mark
> noticed.
> For performance pme-order=6 is bad. You spend 50% of CPU time in PME
> spread+gather. Order 6 is not SIMD intrinsics accelerated. Using
> pme-order=5 will be about twice as fast. You can reduce the grid spacing
> a bit if you think you need high PME accuracy.
> Cheers,
> Berk

More information about the gromacs.org_gmx-developers mailing list