[gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Szilárd Páll pall.szilard at gmail.com
Thu Jan 30 12:14:48 CET 2014


That sounds strange.

Does the error happen at step? Assuming the it does occur within the
first 10 steps, here are a few things to try:
- Run "cuda-memcheck mdrun -nsteps 10";
- Try running with GMX_EMULATE_GPU env. var. set? This will run the
GPU acceleration code-path, but will use CPU kernels (equivalent to
the CUDA but slow implementation).
- Run with GMX_EMULATE_GPU using valgrind: "GMX_EMULATE_GPU=1 valgrind
mdrun -nsteps 10"

Cheers,
--
Szilárd


On Thu, Jan 30, 2014 at 11:47 AM, AOWI (Anders Ossowicki)
<AOWI at novozymes.com> wrote:
> Thanks for your suggestions!
>
>> I would not make any assumptions though, but rather try a few things first:
>> - Does the card pass a memtest (sourceforge.net/projects/cudagpumemtest/)?
> The memtest ran for about an hour with no errors.
>
>> - Does the installation pass the regressiontests?
> No. These four complex tests fail, all with the usual error:
>
> FAILED. Check mdrun.out, md.log files in nbnxn_pme
> FAILED. Check mdrun.out, md.log files in nbnxn_rf
> FAILED. Check mdrun.out, md.log files in nbnxn_rzero
> FAILED. Check mdrun.out, md.log files in nbnxn_vsite
>
> Everything else passes.
>
>> - Is the error reproducible with other inputs?
> Yes, so far anything that has caused Gromacs to engage the GPU has failed. Our own runs, the samples from the Gromacs website, and the four tests above.
>
>> Also note that with the default invocation of mdrun you are attempting to use all cores/hardware threads in your machine (I assume a 2x12-core IVB-E node with HT on).
>
> Two Xeon E5-2697V2 processors yes. This is a test server for gauging the potential performance gains of GPGPU with our own runs. We'll stick to a proper CPU-GPU ratio for the performance measurements. This was just me trying to pare it down to the simplest invocation.
>
> We have had no trouble using other CUDA-enabled tools on this particular test server. NAMD, for example, works fine.
> --
> Anders Ossowicki
>


More information about the gromacs.org_gmx-users mailing list