[gmx-developers] MPI stall?
Michael Shirts
michael.shirts at virginia.edu
Sun Dec 6 16:03:25 CET 2009
Hi, all-
I'm getting a weird MPI stall with the git master repository version.
I compiled with with debugging on and double precision, running on a 8
processor MacPro.
After running for 10 min or so parallelized 8 ways, it appears to
stall. Attaching a debugger to the threads to see where it's stuck,
the backtrace on the head node was (removing arguments for clarity)
#0 0x907fb29a in write$NOCANCEL$UNIX2003 ()
#1 0x907fb1f2 in _swrite ()
#2 0x907fb11f in __sflush ()
#3 0x907ffcfc in __swbuf ()
#4 0x90838e92 in fputc ()
#5 0x000c2dfd in print_time (out=0xa00c7690, runtime=0xbfffd5e0,
step=44600, ir=0x1017e00, cr=0x9004e0) at sim_util.c:164
#6 0x00019215 in do_md at md.c:2316
#7 0x00013138 in mdrunner at md.c:216
#9 0x0001b9cc in main (argc=14, argv=0xbffff3a0) at mdrun.c:518
And for the other nodes;
#0 0x907c536a in swtch_pri ()
#1 0x90832e65 in sched_yield ()
#2 0x00a05515 in mca_pml_ob1_send ()
#3 0x00710445 in MPI_Sendrecv ()
#4 0x00048fe4 in dd_sendrecv_rvec (dd=0x91dc00, ddimind=0,
direction=1, buf_s=0x1034c00, n_s=333, buf_r=0xd22f38, n_r=360) at
domdec_network.c:115
#5 0x00029c32 in dd_move_x (dd=0x91dc00, box=0x9260fc, x=0xd21000) at
domdec.c:657
#6 0x000c3f77 in do_force at sim_util.c:521
#7 0x00017478 in do_md at md.c:1794
#8 0x00013138 in mdrunner at md.c:687
#9 0x00011cbb in mdrunner_threads at md.c:216
#10 0x0001b9cc in main (argc=14, argv=0x9184e0) at mdrun.c:518
Any other observations of this? Has this been seen on other MacPros?
With debugging on?
Best,
~~~~~~~~~~~~
Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu
(434)-243-1821
More information about the gromacs.org_gmx-developers
mailing list