[gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Fri Jan 31 12:14:09 CET 2014

That's just weird. The "Cuda API error detected" does not sound good -
perhaps it's a sing of a CUDA runtime bug?

I suggest that you try CUDA 5.0 and see if that works.
--
Szilárd

On Fri, Jan 31, 2014 at 12:03 PM, AOWI (Anders Ossowicki)
<AOWI at novozymes.com> wrote:
>> There should be line numbers below - and perhaps a bit more information on what's causing the error - at least that's what I'm hoping for.
>
> Hrm, there wasn't really any more info. I ran it (mdrun -nsteps 1 -ntomp 1) within cuda-gdb (with set cuda api_failures stop) and got this:
> Using 1 MPI thread
> Using 1 OpenMP thread
>
> 1 GPU detected:
>   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>
> 1 GPU auto-selected for this run.
> Mapping of GPU to the 1 PP rank in this node: #0
>
> [New Thread 0x7ffff2e3c700 (LWP 9066)]
> [Context Create of context 0x68d5e0 on Device 0]
> [Launch of CUDA Kernel 1 (memset32_aligned1D<<<(2,1,1),(128,1,1)>>>) on Device 0]
> [Launch of CUDA Kernel 2 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
> [Launch of CUDA Kernel 3 (memset32_aligned1D<<<(1,1,1),(128,1,1)>>>) on Device 0]
>
> Back Off! I just backed up ener.edr to ./#ener.edr.36#
> starting mdrun 'RNASE ZF-1A in water'
> 1 steps,      0.0 ps.
> [Launch of CUDA Kernel 4 (memset32_aligned1D<<<(895,1,1),(128,1,1)>>>) on Device 0]
> [Launch of CUDA Kernel 5 (k_nbnxn_rf_ener_prune<<<(1203,1,1),(8,8,1)>>>) on Device 0]
> Cuda API error detected: cudaStreamSynchronize returned (0x4)
> (cuda-gdb) thread apply all bt
> Thread 3 (Thread 0x7ffff2e3c700 (LWP 9066)):
> #0  0x00007ffff4d3f763 in select () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007ffff37c33d8 in ?? () from /usr/lib/nvidia-current/libcuda.so
> #2  0x00007ffff3239b95 in ?? () from /usr/lib/nvidia-current/libcuda.so
> #3  0x00007ffff37c5259 in ?? () from /usr/lib/nvidia-current/libcuda.so
> #4  0x00007ffff47efe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007ffff4d463fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7ffff7fe7780 (LWP 9059)):
> #0  0x00007ffff32792e0 in cudbgReportDriverApiError () from /usr/lib/nvidia-current/libcuda.so
> #1  0x00007ffff3279862 in ?? () from /usr/lib/nvidia-current/libcuda.so
> #2  0x00007ffff329602a in ?? () from /usr/lib/nvidia-current/libcuda.so
> #3  0x00007ffff32a32e3 in ?? () from /usr/lib/nvidia-current/libcuda.so
> #4  0x00007ffff32ba443 in ?? () from /usr/lib/nvidia-current/libcuda.so
> #5  0x00007ffff32c1660 in ?? () from /usr/lib/nvidia-current/libcuda.so
> #6  0x00007ffff4a3bf5e in cudaStreamSynchronize () from /usr/local/cuda-5.5/lib64/libcudart.so.5.5
> #7  0x00007ffff7581ab5 in nbnxn_cuda_wait_gpu (cu_nb=0x6b8320, nbatom=0xb84d90, flags=1013, aloc=0, e_lj=0xb8ed40, e_el=0xb97630, fshift=0x6b7b50)
>     at /home/nztest/src/gromacs-4.6.5/src/mdlib/nbnxn_cuda/nbnxn_cuda.cu:590
> #8  0x00007ffff74647a5 in do_force_cutsVERLET (fplog=0x64c020, cr=0x64aa30, inputrec=0x69d7e0, step=0, nrnb=0x6b70c0, wcycle=0x69ebf0, top=0xbc8af0,
>     mtop=0x69dc10, groups=0x69dcd0, box=0xbf5698, x=0x763930, hist=0xbf57f0, f=0xc1bee0, vir_force=0x7fffffffae20, mdatoms=0xb8e1c0, enerd=0xb8e940,
>     fcd=0x809360, lambda=0x16a5110, graph=0x0, fr=0x6b7460, ic=0xb88360, vsite=0x0, mu_tot=0x7fffffffb060, t=0, field=0x0, ed=0x0, bBornRadii=1, flags=1013)
>     at /home/nztest/src/gromacs-4.6.5/src/mdlib/sim_util.c:1359
> #9  0x00007ffff7466986 in do_force (fplog=0x64c020, cr=0x64aa30, inputrec=0x69d7e0, step=0, nrnb=0x6b70c0, wcycle=0x69ebf0, top=0xbc8af0, mtop=0x69dc10,
>     groups=0x69dcd0, box=0xbf5698, x=0x763930, hist=0xbf57f0, f=0xc1bee0, vir_force=0x7fffffffae20, mdatoms=0xb8e1c0, enerd=0xb8e940, fcd=0x809360,
>     lambda=0x16a5110, graph=0x0, fr=0x6b7460, vsite=0x0, mu_tot=0x7fffffffb060, t=0, field=0x0, ed=0x0, bBornRadii=1, flags=1013)
>     at /home/nztest/src/gromacs-4.6.5/src/mdlib/sim_util.c:1999
> #10 0x000000000042450e in do_md (fplog=0x64c020, cr=0x64aa30, nfile=36, fnm=0x7fffffffc120, oenv=0x64b080, bVerbose=0, bCompact=1, nstglobalcomm=10,
>     vsite=0x0, constr=0xb8e2b0, stepout=100, ir=0x69d7e0, top_global=0x69dc10, fcd=0x809360, state_global=0x6b8d00, mdatoms=0xb8e1c0, nrnb=0x6b70c0,
>     wcycle=0x69ebf0, ed=0x0, fr=0x6b7460, repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=-1, membed=0x0, cpt_period=15, max_hours=-1,
>     deviceOptions=0x43910c "", Flags=1055744, runtime=0x7fffffffb540) at /home/nztest/src/gromacs-4.6.5/src/kernel/md.c:1178
> #11 0x0000000000411b50 in mdrunner (hw_opt=0x7fffffffcd80, fplog=0x64c020, cr=0x64aa30, nfile=36, fnm=0x7fffffffc120, oenv=0x64b080, bVerbose=0,
>     bCompact=1, nstglobalcomm=-1, ddxyz=0x7fffffffce90, dd_node_order=1, rdd=0, rconstr=0, dddlb_opt=0x4390e9 "auto", dlb_scale=0.800000012, ddcsx=0x0,
>     ddcsy=0x0, ddcsz=0x0, nbpu_opt=0x4390e9 "auto", nsteps_cmdline=1, nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0, repl_ex_nex=0,
>     repl_ex_seed=-1, pforce=-1, cpt_period=15, max_hours=-1, deviceOptions=0x43910c "", Flags=1055744)
>     at /home/nztest/src/gromacs-4.6.5/src/kernel/runner.c:1700
> #12 0x000000000042a57d in cmain (argc=1, argv=0x7fffffffe0e8) at /home/nztest/src/gromacs-4.6.5/src/kernel/mdrun.c:747
> #13 0x0000000000431c18 in main (argc=5, argv=0x7fffffffe0e8) at main.c:29
>
> I don't know if it tells you anything. If you want me to provide something else from gdb, just let me know.
>
>> One other thing you could try is to set "coulombtype = reaction-field" in the mdp file and re-generate the tpr. These runs will use a different CUDA kernel. Just guessing, it may not make much difference at all.
>
> Yep, same error. The above debug session was done with this.
> --
> Anders Ossowicki
>