[gmx-users] GMX 2018 regression tests: cufftPlanMany R2C plan failure (error code 5)

Wed Feb 7 17:18:43 CET 2018

Hi Mark,

Nothing has been installed yet, so the commands were issued from 
/build/bin and so I am not sure about the output of that mdrun-test (let 
me know what exact command could make it more informative).

Thank you,

Alex

***

 > ./gmx -version

GROMACS version:    2018
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX2_256
FFT library:        fftw-3.3.5-fma-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      hwloc-1.11.0
Tracing support:    disabled
Built on:           2018-02-06 19:30:36
Built by:           smolyan at 647trc-md1 [CMAKE]
Build OS/arch:      Linux 4.4.0-112-generic x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Build CPU family:   6   Model: 79   Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle 
htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse 
rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 5.4.0
C compiler flags:    -march=core-avx2     -O3 -DNDEBUG 
-funroll-all-loops -fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 5.4.0
C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG 
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda 
compiler driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on 
Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
CUDA compiler 
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;; 
;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        9.10
CUDA runtime:       9.10

 > ldd -r ./mdrun-test
         linux-vdso.so.1 =>  (0x00007ffcfcc3e000)
         libgromacs.so.3 => 
/home/smolyan/scratch/gmx2018_install_temp/gromacs-2018/build/bin/./../lib/libgromacs.so.3 
(0x00007faa58f8f000)
         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x00007faa58d72000)
         libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
(0x00007faa589f0000)
         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007faa586e7000)
         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
(0x00007faa584d1000)
         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faa58107000)
         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007faa57f03000)
         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007faa57cfb000)
         libcufft.so.9.1 => /usr/local/cuda/lib64/libcufft.so.9.1 
(0x00007faa5080e000)
         libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 
(0x00007faa505d4000)
         libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 
(0x00007faa503b2000)
         /lib64/ld-linux-x86-64.so.2 (0x00007faa5c1ad000)
         libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 
(0x00007faa501a7000)
         libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 
(0x00007faa4ff9d000)

On 2/7/2018 5:13 AM, Mark Abraham wrote:
> Hi,
>
> I checked back with the CUDA-facing GROMACS developers. They've run the
> code with 9.1 and believe there's no intrinsic problem within GROMACS.
>
>> So I don't have much to suggest other then rebuilding everything cleanly,
> as this is an internal non-descript cuFFT/driver error that is not supposed
> to happen,
> especially in mdrun-test with its single input system, and it will prevent
> him from using -pme gpu.
>> The only thing PME could do better is to show more meaningful error
> messages (which would have to be hardcoded anyway as cuFFT doesn't even
> have human readable strings for error codes).
>
> If you could share the output of
> * gmx -version
> * ldd -r mdrun-test
> then perhaps we can find an issue (or at least report to nvidia usefully).
> Ensuring you are using the CUDA driver that came with the CUDA runtime is
> most likely to work smoothly.
>
> Mark
>
> On Tue, Feb 6, 2018 at 9:24 PM Alex <nedomacho at gmail.com> wrote:
>
>> And this is with:
>>> gcc --version
>>> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609 <020-16%2006%2009>
>>
>>
>> On Tue, Feb 6, 2018 at 1:18 PM, Alex <nedomacho at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I've just built the latest version and regression tests are running. Here
>>> is one error:
>>>
>>> "Program:     mdrun-test, version 2018
>>> Source file: src/gromacs/ewald/pme-3dfft.cu (line 56)
>>>
>>> Fatal error:
>>> cufftPlanMany R2C plan failure (error code 5)"
>>>
>>> This is with CUDA 9.1.
>>>
>>> Anything to worry about?
>>>
>>> Thank you,
>>>
>>> Alex
>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>