[gmx-users] Parallel FFTW performance

Mon Oct 27 00:23:01 CET 2003

Hi,

The problem is that the fourier transform is very fast, so when it is 
parallelized over a lot of nodes with MPI the communication will kill 
you.

We're working on this - the solution will simply be to skip FFTW 
parallelization completely. We will use both threads and MPI in 
Gromacs, and do all synchronization ourselves. The additional obstacle 
is that you don't want to use all nodes in your system for the PME 
part. This is trivial to work around with 3-4 nodes, but with 16+ nodes 
and domain decomposition it's a pretty complicated problem.

Cheers,

Erik

On Oct 26, 2003, at 2:18 PM, Dean Johnson wrote:

> On Sun, 2003-10-26 at 15:45, Bing Kim wrote:
>> Hi All!
>>
>> I am sorry this question is not for Gromacs exactly but for FFTW.
>> But this question would raise on Gramacs too.
>> I recently installed FFTW-2.1.5 which can use MPI.
>> It was compiled with gcc-3.3.2 and mpich-1.2.5.2.
>> When I ran a benchmark test program, rfftw_mpi_test, which is located 
>> in
>> fftw-2.1.5/mpi,
>> I found that its performace is worse in dual cpus than single cpu.
>> Basically, communication cost should be zero in SMP machine.. that I
>> expected.
>> So.. I wonder if gromacs use rfftw_mpi, how it can get speed up in 
>> multiple
>> processors.
>> Please help me understand this thing.
>>
>
> That is also our experience, not just with Gromacs/FFTW, but also with
> Amber7. We solve that by running two 16x1 models concurrently. The cost
> of 8x2 is only a little more than 16x1.
>
> -- 
>
> 	-Dean
>
> _______________________________________________
> gmx-users mailing list
> gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.