[gmx-developers] MPI stall?

Michael Shirts michael.shirts at virginia.edu
Sun Dec 6 16:03:25 CET 2009

Hi, all-

I'm getting a weird MPI stall with the git master repository version.
I compiled with with debugging on and double precision, running on a 8
processor MacPro.
After running for 10 min or so parallelized 8 ways, it appears to
stall.  Attaching a debugger to the threads to see where it's stuck,
the backtrace on the head node was (removing arguments for clarity)

#0  0x907fb29a in write$NOCANCEL$UNIX2003 ()
#1  0x907fb1f2 in _swrite ()
#2  0x907fb11f in __sflush ()
#3  0x907ffcfc in __swbuf ()
#4  0x90838e92 in fputc ()
#5  0x000c2dfd in print_time (out=0xa00c7690, runtime=0xbfffd5e0,
step=44600, ir=0x1017e00, cr=0x9004e0) at sim_util.c:164
#6  0x00019215 in do_md  at md.c:2316
#7  0x00013138 in mdrunner  at md.c:216
#9  0x0001b9cc in main (argc=14, argv=0xbffff3a0) at mdrun.c:518

And for the other nodes;

#0  0x907c536a in swtch_pri ()
#1  0x90832e65 in sched_yield ()
#2  0x00a05515 in mca_pml_ob1_send ()
#3  0x00710445 in MPI_Sendrecv ()
#4  0x00048fe4 in dd_sendrecv_rvec (dd=0x91dc00, ddimind=0,
direction=1, buf_s=0x1034c00, n_s=333, buf_r=0xd22f38, n_r=360) at
#5  0x00029c32 in dd_move_x (dd=0x91dc00, box=0x9260fc, x=0xd21000) at
#6  0x000c3f77 in do_force  at sim_util.c:521
#7  0x00017478 in do_md  at md.c:1794
#8  0x00013138 in mdrunner at md.c:687
#9  0x00011cbb in mdrunner_threads  at md.c:216
#10 0x0001b9cc in main (argc=14, argv=0x9184e0) at mdrun.c:518

Any other observations of this?  Has this been seen on other MacPros?
With debugging on?

Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu

More information about the gromacs.org_gmx-developers mailing list