[gmx-users] MPI_Recv invalid count and system explodes for large but not small parallelization on power6 but not opterons
chris.neale at utoronto.ca
chris.neale at utoronto.ca
Wed Mar 4 05:32:35 CET 2009
Thanks Roland,<br />
<br />
The system has 500,000 atoms. I use PME and a 0.9 nm cutoff for
coulombic and a 1.4/0.9 nm twin range cutoff for LJ. The interconnect
is infiniband between power6 nodes that each have 32 cores @ 4.2 GHz
that are multithreaded so that I can put 64 "tasks" on each
node. The scaling on the power6 is like this:<br />
<br />
N=1 0.036 ns/day (Scaling Efficiency)<br />
N=2 0.070 ns/day (97%)<br />
N=4 0.135 ns/day (94%)<br />
N=8 0.246 ns/day (85%)<br />
N=16 0.449 ns/day (75%)<br />
N=32 0.787 ns/day (68%)<br />
N=60 0.984 ns/day (46%)<br />
And I have errors above N=60.<br />
<br />
In terms of what you want from the log file, are you referring to
general scaling issues for, for example, N=32, which does run
successfully? Or are you rather requesting more log file information
to assist me deal with my errors while running 196 or 200
"tasks"?<br />
<br />
Thanks so much for your help Roland,<br />
Chris.<br />
<br />
-- original message --<br />
<br />
Chris,<br />
<br />
depending on your system size and th interconnect this might be OK.<br />
<br />
Thus you need to give us more information. E.g.: how many atoms, how
many<br />
ns/day, what interconnect, PME?.<br />
<br />
Also the messages at the end of the md.log might tell you some advices
to<br />
improve performance.<br />
<br />
Roland<br />
<br />
On Tue, Mar 3, 2009 at 10:48 PM, <chris.neale at utoronto.ca> wrote:<br />
<br />
[Hide Quoted Text]<br />
Thanks Mark,<br />
<br />
your information is always useful. In this case, the page that you<br />
reference appears to be empty. All I see is "There is currently
no text in<br />
this page, you can search for this page title in other pages or edit
this<br />
page."<br />
<br />
Thanks also for your consideration of the massive scaling issue.<br />
<br />
Chris.<br />
<br />
chris.neale at utoronto.ca wrote:<br />
Hello,<br />
<br />
I am currently testing a large system on a power6 cluster. I have
compiled<br />
gromacs 4.0.4 successfully, and it appears to be working fine for <64<br />
"cores" (sic, see later). First, I notice that it runs at
approximately 1/2<br />
the speed that it obtains on some older opterons, which is unfortunate
but<br />
acceptable. Second, I run into some strange issues when I have a greater<br />
number of cores. Since there are 32 cores per node with simultaneous<br />
multithreading this yields 64 tasks inside one box, and I realize that
these<br />
problems could be MPI related.<br />
<br />
Some background:<br />
This test system is stable for > 100ns on an opteron so I am quite<br />
confident that I do not have a problem with my topology or starting<br />
structure.<br />
<br />
Compilation was successful with -O2 only when I modified the ./configure<br />
file as follows, otherwise I got a stray ')' and a linking error:<br />
[cneale at tcs-f11n05]$ diff configure.000 configure<br />
5052a5053<br />
ac_cv_f77_libs="-L/scratch/cneale/exe/fftw-3.1.2_aix/exec/lib
-lxlf90<br />
-L/usr/lpp/xlf/lib -lxlopt -lxlf -lxlomp_ser -lpthreads -lm -lc"<br />
Rather than modify configure, I suggest you use a customized command
line, such as the one described here<br />
<a target="_blank"
href="http://wiki.gromacs.org/index.php/GROMACS_on_BlueGene">http://wiki.gromacs.org/index.php/GROMACS_on_BlueGene</a>. The output<br
/>
config.log will have a record of what you did, too.<br />
<br />
Sorry I can't help with the massive scaling issue.<br />
<br />
Mark
More information about the gromacs.org_gmx-users
mailing list