[gmx-users] Problem with gromacs-3.3 using PME
Robert Bjornson
rbjornson at gmail.com
Fri Jan 6 16:21:17 CET 2006
Hi,
A couple of weeks ago I sent a message to the list reporting a problem
I was having with gromacs-3.3 when using PME. Gromacs was segment
faulting within pme.c after what I believe to be a small number of
timesteps.
The initial configuration was: 8 EMT64 Xeon cpus, running RHE WS
release 3, using LAM 7.1.1, gnu compilers.
I tried varying a number of things:
- using intel compilers instead of gnu
- compiling using intel 32 bit compilers
- compiled without mpi, running sequentially on EMT64
- compiled without mpi, running sequentially on 32 bit Xeon
All of these runs failed when using pme, but ran fine using cut-off.
I'll append some debugging info at the end of this email.
I then compiled gromacs-3.2.1, both sequentially and using lam, and
was able to run the pme input case without problem.
It sure looks to me like a bug that is activated by pme. What is the
protocol for submitting a bug request? I'd be happy to provide
whatever debugging info would be helpful.
I'm also curious; what am I giving up by using gromacs-3.2.1 rather than 3.3?
I apologize if this is the incorrect list for this message; perhaps it
should have gone to the developers list directly. Please let me know
if that is the case.
Rob Bjornson
<begin debugging info>
Here is a sample stack trace from a sequential run on 32bit Xeon:
#0 0x0807e46b in spread_q_bsplines (grid=0x82fcf68, idx=0x424c9008,
charge=0x40c82008, theta=0x828b524, nr=72240, order=6, nnx=0x8330c80,
nny=0x8331138,
nnz=0x83315f0) at pme.c:527
#1 0x080814fd in spread_on_grid (logfile=0x8291068, grid=0x82fcf68,
homenr=72240, pme_order=6, x=0x409be008, charge=0x40c82008,
box=0x82911dc,
bGatherOnly=0, bHaveSplines=0) at pme.c:1180
#2 0x080817da in do_pme (logfile=0x8291068, bVerbose=0, ir=0x82a2ee0,
x=0x409be008, f=0x417ba008, chargeA=0x40c82008, chargeB=0x40cc9008,
box=0x82911dc,
cr=0x8291008, nsb=0x8292400, nrnb=0xbfffcfe0, vir=0x82fc27c,
ewaldcoeff=3.47045946, bFreeEnergy=0, lambda=0, dvdlambda=0xbfffca5c,
bGatherOnly=0)
at pme.c:1276
#3 0x0806a83f in force (fplog=0x8291068, step=25, fr=0x82fc178,
ir=0x82a2ee0, idef=0x8293424, nsb=0x8292400, cr=0x8291008, mcr=0x0,
nrnb=0xbfffcfe0,
grps=0x8291ed8, md=0x8291720, ngener=2, opts=0x82a30bc,
x=0x409be008, f=0x4111a008, epot=0x8291de8, fcd=0x8292318, bVerbose=0,
box=0x82911dc,
lambda=0, graph=0x8291858, excl=0x82a1e2c, bNBFonly=0,
bDoForces=1, mu_tot=0xbfffcb20, bGatherOnly=0, edyn=0xbfffd8d0) at
force.c:1306
#4 0x0808f003 in do_force (fplog=0x8291068, cr=0x8291008, mcr=0x0,
inputrec=0x82a2ee0, nsb=0x8292400, step=25, nrnb=0xbfffcfe0,
top=0x8293420,
grps=0x8291ed8, box=0x82911dc, x=0x409be008, f=0x4111a008,
buf=0x41046008, mdatoms=0x8291720, ener=0x8291de8, fcd=0x8292318,
bVerbose=0, lambda=0,
graph=0x8291858, bStateChanged=1, bNS=0, bNBFonly=0, bDoForces=1,
fr=0x82fc178, mu_tot=0xbfffcfb0, bGatherOnly=0, t=0.0250000004,
field=0x0,
edyn=0xbfffd8d0) at sim_util.c:334
#5 0x08059100 in do_md (log=0x8291068, cr=0x8291008, mcr=0x0,
nfile=25, fnm=0x82840a0, bVerbose=0, bCompact=1, bVsites=0,
vsitecomm=0x0, stepout=10,
inputrec=0x82a2ee0, grps=0x8291ed8, top=0x8293420, ener=0x8291de8,
fcd=0x8292318, state=0x82911d0, vold=0x412c2008, vt=0x411ee008,
f=0x4111a008,
buf=0x41046008, mdatoms=0x8291720, nsb=0x8292400, nrnb=0x82a3188,
graph=0x8291858, edyn=0xbfffd8d0, fr=0x82fc178, repl_ex_nst=0,
repl_ex_seed=-1,
Flags=0) at md.c:622
#6 0x08057dda in mdrunner (cr=0x8291008, mcr=0x0, nfile=25,
fnm=0x82840a0, bVerbose=0, bCompact=1, nDlb=0, nstepout=10,
edyn=0xbfffd8d0, repl_ex_nst=0,
repl_ex_seed=-1, Flags=0) at md.c:227
#7 0x0805ad10 in main (argc=3, argv=0xbfffd984) at mdrun.c:253
Examining things under gdb revealed that one element of the the idxptr
array appeared to have been corrupted:
(gdb) print idxptr[0]
$33 = 1062338964
(gdb) print idxptr[1]
$34 = 12
(gdb) print idxptr[2]
$35 = 32
(gdb) print idxptr[-1]
$36 = 24
(gdb) print idxptr[-2]
$37 = 59
(gdb) print nx
$38 = 60
(gdb) print ny
$39 = 60
(gdb) print nz
$40 = 60
Here is the code in question (pme.c) The segment fault occurs on line
527, after xidx was (apparently erroneously) set to a very large value
in line 515. Note that DEBUG wasn't defined for my compilation
510 for(n=0; (n<nr); n++) {
511 qn = charge[n];
512 idxptr = idx[n];
513
514 if (qn != 0) {
515 xidx = idxptr[XX];
516 yidx = idxptr[YY];
517 zidx = idxptr[ZZ];
518 #ifdef DEBUG
519 range_check(xidx,0,nx);
520 range_check(yidx,0,ny);
521 range_check(zidx,0,nz);
522 #endif
523 i0 = ii0+xidx; /* Pointer arithmetic */
524 norder = n*4;
525 norder1 = norder+4;
526
527 i = ii0[xidx];
528 j = jj0[yidx];
529 k = kk0[zidx];
530
More information about the gromacs.org_gmx-users
mailing list