[gmx-users] Parallel FFTW performance
lindahl at csb.stanford.edu
Mon Oct 27 00:23:01 CET 2003
The problem is that the fourier transform is very fast, so when it is
parallelized over a lot of nodes with MPI the communication will kill
We're working on this - the solution will simply be to skip FFTW
parallelization completely. We will use both threads and MPI in
Gromacs, and do all synchronization ourselves. The additional obstacle
is that you don't want to use all nodes in your system for the PME
part. This is trivial to work around with 3-4 nodes, but with 16+ nodes
and domain decomposition it's a pretty complicated problem.
On Oct 26, 2003, at 2:18 PM, Dean Johnson wrote:
> On Sun, 2003-10-26 at 15:45, Bing Kim wrote:
>> Hi All!
>> I am sorry this question is not for Gromacs exactly but for FFTW.
>> But this question would raise on Gramacs too.
>> I recently installed FFTW-2.1.5 which can use MPI.
>> It was compiled with gcc-3.3.2 and mpich-184.108.40.206.
>> When I ran a benchmark test program, rfftw_mpi_test, which is located
>> I found that its performace is worse in dual cpus than single cpu.
>> Basically, communication cost should be zero in SMP machine.. that I
>> So.. I wonder if gromacs use rfftw_mpi, how it can get speed up in
>> Please help me understand this thing.
> That is also our experience, not just with Gromacs/FFTW, but also with
> Amber7. We solve that by running two 16x1 models concurrently. The cost
> of 8x2 is only a little more than 16x1.
> gmx-users mailing list
> gmx-users at gromacs.org
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users