[gmx-developers] Gromacs FFT
Kent.Knox at amd.com
Wed Dec 10 17:09:57 CET 2008
"Is it correct that the ACML only supports serial FFT so far?"
Our 2d and 3d interfaces are threaded with OpenMP, but not our 1d interfaces. We currently have no support for MPI style threading.
I've done a naïve printf style instrumentation of the gromacs FFT interface, and can only see 3d real-to-complex/complex-to-real style fft's being used. For an MPI build of gromacs, I see that gromacs chunks 3d FFT's into 2D FFT's itself and passes those down to the underlying fft library to finish. I believe that in these instances ACML is threaded appropriately, but please let me know if I am drawing the wrong conclusions. I am basing my observations purely on the d.lzm bench.
"Do you plan to add an parallel FFT or an extension as for the linear algegra routines with AMD ScaLAPACK?"
Currently, AMD does not plan on a parallel FFT extension like ScaLAPACK. Our experiences with ScaLAPACK are that the various MPI flavors make it very difficult to support such a product. I refer you to a post in the AMD forums:
I will send a follow-up email in a separate thread to ask you some questions, if I may. I lack the domain specific knowledge to truly understand the needs and requirements of this community.
From: roland at rschulz.eu [mailto:roland at rschulz.eu] On Behalf Of Roland Schulz
Sent: Tuesday, December 09, 2008 2:24 PM
To: Discussion list for GROMACS development
Cc: Knox, Kent
Subject: Re: [gmx-developers] Gromacs FFT
usually FFT is not a bottleneck for MD when run on one or a few processors. You can increase the FFT load slightly by using a small cut-off (rcoulomb in the mdp file) and a fine grid (fourierspacing in mdp). Typical one uses a minimum of rcoloumb 0.8 and fourierspacing of 1.1. But you could decrease fourierspacing further to see the effect on the FFT time.
FFT becomes the mayor bottleneck for parallel runs on more than a few hundred CPUs. I did some work on parallel FFT on Jaguar and Kraken. Let me know in case you are also interested in parallel FFT. Is it correct that the ACML only supports serial FFT so far? Do you plan to add an parallel FFT or an extension as for the linear algegra routines with AMD ScaLAPACK?
More information about the gromacs.org_gmx-developers