[gmx-users] MPI_Recv invalid count and system explodes for large but not small parallelization on power6 but not opterons

Mark Abraham Mark.Abraham at anu.edu.au
Wed Mar 4 04:33:19 CET 2009


chris.neale at utoronto.ca wrote:
> Hello,
> 
> I am currently testing a large system on a power6 cluster. I have 
> compiled gromacs 4.0.4 successfully, and it appears to be working fine 
> for <64 "cores" (sic, see later). First, I notice that it runs at 
> approximately 1/2 the speed that it obtains on some older opterons, 
> which is unfortunate but acceptable. Second, I run into some strange 
> issues when I have a greater number of cores. Since there are 32 cores 
> per node with simultaneous multithreading this yields 64 tasks inside 
> one box, and I realize that these problems could be MPI related.
> 
> Some background:
> This test system is stable for > 100ns on an opteron so I am quite 
> confident that I do not have a problem with my topology or starting 
> structure.
> 
> Compilation was successful with -O2 only when I modified the ./configure 
> file as follows, otherwise I got a stray ')' and a linking error:
> [cneale at tcs-f11n05]$ diff configure.000 configure
> 5052a5053
>> ac_cv_f77_libs="-L/scratch/cneale/exe/fftw-3.1.2_aix/exec/lib -lxlf90 
>> -L/usr/lpp/xlf/lib -lxlopt -lxlf -lxlomp_ser -lpthreads -lm -lc"

Rather than modify configure, I suggest you use a customized command 
line, such as the one described here 
http://wiki.gromacs.org/index.php/GROMACS_on_BlueGene. The output 
config.log will have a record of what you did, too.

Sorry I can't help with the massive scaling issue.

Mark



More information about the gromacs.org_gmx-users mailing list