[gmx-developers] free energies on GPUs?
ileontyev at ucdavis.edu
Thu Feb 23 10:37:42 CET 2017
Berk and Mark,
Thanks for your comments. There was no multiple runs issue, because only
single job was running. (BTW, I thought with default "-pinstride 0"
mdrun minimizes the number of threads per physical core, isn't it?)
Regarding pme-order=6 it was my confusion. Higher pme accuracy was not
needed. The manual said: "You might try 6/8/10 when running in parallel
and simultaneously decrease grid dimension." I thought increasing
pme-order should unload cpu while it works completely opposite. The use
of pme-order=4 gave 60% better performance resulting in 100% speedup on
GPU (vs 50% with pme-order=6).
There might be still some compiler/optimization issues. Surprisingly, my
mdrun binary compiled vs fftw-3.3.4 (with AVX optimization) is 20%
faster than that compiled vs fftw-3.3.5 and fftw-3.3.6 with AVX2.
> Message: 1
> Date: Thu, 23 Feb 2017 01:52:40 +0100
> From: Berk Hess <hess at kth.se>
> To: gmx-developers at gromacs.org
> Subject: Re: [gmx-developers] free energies on GPUs?
> Message-ID: <0133de93-dd40-b89f-b960-e25ace1d3cec at kth.se>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> I don't see anything strange, apart from the multiple run issue Mark
> For performance pme-order=6 is bad. You spend 50% of CPU time in PME
> spread+gather. Order 6 is not SIMD intrinsics accelerated. Using
> pme-order=5 will be about twice as fast. You can reduce the grid spacing
> a bit if you think you need high PME accuracy.
More information about the gromacs.org_gmx-developers