[gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Fri Jan 31 12:04:19 CET 2014

> There should be line numbers below - and perhaps a bit more information on what's causing the error - at least that's what I'm hoping for.

Hrm, there wasn't really any more info. I ran it (mdrun -nsteps 1 -ntomp 1) within cuda-gdb (with set cuda api_failures stop) and got this:
Using 1 MPI thread
Using 1 OpenMP thread

1 GPU detected:
  #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible

1 GPU auto-selected for this run.
Mapping of GPU to the 1 PP rank in this node: #0

[New Thread 0x7ffff2e3c700 (LWP 9066)]
[Context Create of context 0x68d5e0 on Device 0]
[Launch of CUDA Kernel 1 (memset32_aligned1D<<<(2,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 2 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 3 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]

Back Off! I just backed up ener.edr to ./#ener.edr.36#
starting mdrun 'RNASE ZF-1A in water'
1 steps,      0.0 ps.
[Launch of CUDA Kernel 4 (memset32_aligned1D<<<(895,1,1),(128,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 5 (k_nbnxn_rf_ener_prune<<<(1203,1,1),(8,8,1)>>>) on Device 0]
Cuda API error detected: cudaStreamSynchronize returned (0x4)
(cuda-gdb) thread apply all bt
Thread 3 (Thread 0x7ffff2e3c700 (LWP 9066)):
#0  0x00007ffff4d3f763 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff37c33d8 in ?? () from /usr/lib/nvidia-current/libcuda.so
#2  0x00007ffff3239b95 in ?? () from /usr/lib/nvidia-current/libcuda.so
#3  0x00007ffff37c5259 in ?? () from /usr/lib/nvidia-current/libcuda.so
#4  0x00007ffff47efe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007ffff4d463fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7fe7780 (LWP 9059)):
#0  0x00007ffff32792e0 in cudbgReportDriverApiError () from /usr/lib/nvidia-current/libcuda.so
#1  0x00007ffff3279862 in ?? () from /usr/lib/nvidia-current/libcuda.so
#2  0x00007ffff329602a in ?? () from /usr/lib/nvidia-current/libcuda.so
#3  0x00007ffff32a32e3 in ?? () from /usr/lib/nvidia-current/libcuda.so
#4  0x00007ffff32ba443 in ?? () from /usr/lib/nvidia-current/libcuda.so
#5  0x00007ffff32c1660 in ?? () from /usr/lib/nvidia-current/libcuda.so
#6  0x00007ffff4a3bf5e in cudaStreamSynchronize () from /usr/local/cuda-5.5/lib64/libcudart.so.5.5
#7  0x00007ffff7581ab5 in nbnxn_cuda_wait_gpu (cu_nb=0x6b8320, nbatom=0xb84d90, flags=1013, aloc=0, e_lj=0xb8ed40, e_el=0xb97630, fshift=0x6b7b50)
    at /home/nztest/src/gromacs-4.6.5/src/mdlib/nbnxn_cuda/nbnxn_cuda.cu:590
#8  0x00007ffff74647a5 in do_force_cutsVERLET (fplog=0x64c020, cr=0x64aa30, inputrec=0x69d7e0, step=0, nrnb=0x6b70c0, wcycle=0x69ebf0, top=0xbc8af0,
    mtop=0x69dc10, groups=0x69dcd0, box=0xbf5698, x=0x763930, hist=0xbf57f0, f=0xc1bee0, vir_force=0x7fffffffae20, mdatoms=0xb8e1c0, enerd=0xb8e940,
    fcd=0x809360, lambda=0x16a5110, graph=0x0, fr=0x6b7460, ic=0xb88360, vsite=0x0, mu_tot=0x7fffffffb060, t=0, field=0x0, ed=0x0, bBornRadii=1, flags=1013)
    at /home/nztest/src/gromacs-4.6.5/src/mdlib/sim_util.c:1359
#9  0x00007ffff7466986 in do_force (fplog=0x64c020, cr=0x64aa30, inputrec=0x69d7e0, step=0, nrnb=0x6b70c0, wcycle=0x69ebf0, top=0xbc8af0, mtop=0x69dc10,
    groups=0x69dcd0, box=0xbf5698, x=0x763930, hist=0xbf57f0, f=0xc1bee0, vir_force=0x7fffffffae20, mdatoms=0xb8e1c0, enerd=0xb8e940, fcd=0x809360,
    lambda=0x16a5110, graph=0x0, fr=0x6b7460, vsite=0x0, mu_tot=0x7fffffffb060, t=0, field=0x0, ed=0x0, bBornRadii=1, flags=1013)
    at /home/nztest/src/gromacs-4.6.5/src/mdlib/sim_util.c:1999
#10 0x000000000042450e in do_md (fplog=0x64c020, cr=0x64aa30, nfile=36, fnm=0x7fffffffc120, oenv=0x64b080, bVerbose=0, bCompact=1, nstglobalcomm=10,
    vsite=0x0, constr=0xb8e2b0, stepout=100, ir=0x69d7e0, top_global=0x69dc10, fcd=0x809360, state_global=0x6b8d00, mdatoms=0xb8e1c0, nrnb=0x6b70c0,
    wcycle=0x69ebf0, ed=0x0, fr=0x6b7460, repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=-1, membed=0x0, cpt_period=15, max_hours=-1,
    deviceOptions=0x43910c "", Flags=1055744, runtime=0x7fffffffb540) at /home/nztest/src/gromacs-4.6.5/src/kernel/md.c:1178
#11 0x0000000000411b50 in mdrunner (hw_opt=0x7fffffffcd80, fplog=0x64c020, cr=0x64aa30, nfile=36, fnm=0x7fffffffc120, oenv=0x64b080, bVerbose=0,
    bCompact=1, nstglobalcomm=-1, ddxyz=0x7fffffffce90, dd_node_order=1, rdd=0, rconstr=0, dddlb_opt=0x4390e9 "auto", dlb_scale=0.800000012, ddcsx=0x0,
    ddcsy=0x0, ddcsz=0x0, nbpu_opt=0x4390e9 "auto", nsteps_cmdline=1, nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0, repl_ex_nex=0,
    repl_ex_seed=-1, pforce=-1, cpt_period=15, max_hours=-1, deviceOptions=0x43910c "", Flags=1055744)
    at /home/nztest/src/gromacs-4.6.5/src/kernel/runner.c:1700
#12 0x000000000042a57d in cmain (argc=1, argv=0x7fffffffe0e8) at /home/nztest/src/gromacs-4.6.5/src/kernel/mdrun.c:747
#13 0x0000000000431c18 in main (argc=5, argv=0x7fffffffe0e8) at main.c:29

I don't know if it tells you anything. If you want me to provide something else from gdb, just let me know.

> One other thing you could try is to set "coulombtype = reaction-field" in the mdp file and re-generate the tpr. These runs will use a different CUDA kernel. Just guessing, it may not make much difference at all.

Yep, same error. The above debug session was done with this.
-- 
Anders Ossowicki