[gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Thu Jan 30 16:20:04 CET 2014

> Well, with a 24k system a single iteration can be done in 2-3 ms, so those 3.3 seconds are mostly initialization and some number of steps - could be one, ten, or even hundred.
Sure, but it fails even with -nsteps 1.

> That doesn't tell much, could you add a -g to the CXX flags?
Same thing:
starting mdrun 'RNASE ZF-1A in water'
1 steps,      0.0 ps.
========= Program hit error 4 on CUDA API call to cudaStreamSynchronize
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/nvidia-current/libcuda.so [0x26d660]
=========     Host Frame:/usr/local/cuda-5.5/lib64/libcudart.so.5.5 (cudaStreamSynchronize + 0x15e) [0x36f5e]
=========     Host Frame:/usr/bin/../lib/libmd.so.8 (nbnxn_cuda_wait_gpu + 0x222) [0xd45ab5]
=========     Host Frame:/usr/bin/../lib/libmd.so.8 (do_force_cutsVERLET + 0x1d20) [0xc287a5]
=========     Host Frame:/usr/bin/../lib/libmd.so.8 (do_force + 0x15d) [0xc2a986]
=========     Host Frame:mdrun (do_md + 0x3cd4) [0x2450e]
=========     Host Frame:mdrun (mdrunner + 0x1f14) [0x11b50]
=========     Host Frame:mdrun (cmain + 0x1dee) [0x2a57d]
=========     Host Frame:mdrun (main + 0x20) [0x31c18]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
=========     Host Frame:mdrun [0x75e9]
=========

>>> - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU acceleration code-path, but will use CPU kernels (equivalent to the CUDA but slow implementation).
>> This seems to run correctly.
> Does correctly mean that you've checked the results or that it completed without a crash?
Just the latter.

-- 
Anders Ossowicki