[gmx-developers] SEGV mdrun mpi failure

Mostyn Lewis Mostyn.Lewis at sun.com
Sun Oct 26 02:57:14 CET 2003


Hello,

Well, the CVS Gromacs from Friday did not prevent a SEGV (signal 11) in the
MPI benchmark I had. The values at failure were still the same in angles
(bondfree.c), with a bad t2 value in the rvec_inc(fr->fshift[t2],f_k);
statement at line 500. So, I browsed in the debugger a little more and
simply by instinct thought that the value of ak in the statement
ivec_sub(SHIFT_IVEC(g,ak),jt,dt_kj); was maybe wrong. It was 1 greater
than g->end and so was taking g->ishift[3002] and getting a bogus
vector.

*g ->
maxedge = 9
nnodes = 3002
nbound = 3002
start = 0
end = 3001
negc = 3002

g->ishift[3002] was {0, 108081, 2}
g->ishift[3001] was {1, 1, 0}

Anyway to cut a tedious debugging tirade short I changed angles to test

 if (g && (ak <= g->end)) {

and do_dih_fup to test

 if (g && (l <= g->end)) {

and the benchmark ran (up to 16 CPUs) - I tried Linux LAM/icc and
SUN 6800 SUNClusterTools/SUN Forte compilers.

I don't think this is a MPI problem it seems to show up more easily in this
mode.

The benchmark is using one of your standard benchmarks, the d.poly-ch2 example.

Is this just a stupid hack or does it have any substance?

Regards,
Mostyn




More information about the gromacs.org_gmx-developers mailing list