[gmx-users] MPI_Recv invalid count and system explodes for large but not small parallelization on power6 but not opterons
Mark Abraham
Mark.Abraham at anu.edu.au
Wed Mar 4 04:33:19 CET 2009
chris.neale at utoronto.ca wrote:
> Hello,
>
> I am currently testing a large system on a power6 cluster. I have
> compiled gromacs 4.0.4 successfully, and it appears to be working fine
> for <64 "cores" (sic, see later). First, I notice that it runs at
> approximately 1/2 the speed that it obtains on some older opterons,
> which is unfortunate but acceptable. Second, I run into some strange
> issues when I have a greater number of cores. Since there are 32 cores
> per node with simultaneous multithreading this yields 64 tasks inside
> one box, and I realize that these problems could be MPI related.
>
> Some background:
> This test system is stable for > 100ns on an opteron so I am quite
> confident that I do not have a problem with my topology or starting
> structure.
>
> Compilation was successful with -O2 only when I modified the ./configure
> file as follows, otherwise I got a stray ')' and a linking error:
> [cneale at tcs-f11n05]$ diff configure.000 configure
> 5052a5053
>> ac_cv_f77_libs="-L/scratch/cneale/exe/fftw-3.1.2_aix/exec/lib -lxlf90
>> -L/usr/lpp/xlf/lib -lxlopt -lxlf -lxlomp_ser -lpthreads -lm -lc"
Rather than modify configure, I suggest you use a customized command
line, such as the one described here
http://wiki.gromacs.org/index.php/GROMACS_on_BlueGene. The output
config.log will have a record of what you did, too.
Sorry I can't help with the massive scaling issue.
Mark
More information about the gromacs.org_gmx-users
mailing list