[gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

Chenou Zhang czhan178 at asu.edu
Tue Dec 3 23:11:04 CET 2019


Hi,

I've run 30 tests with the -notunepme option. I got the following error
from one of them(which is still the same *cudaStreamSynchronize failed*
error):


```
DD  step 1422999  vol min/aver 0.639  load imb.: force  1.1%  pme
mesh/force 1.079
           Step           Time

        1423000     2846.00000



   Energies (kJ/mol)

           Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.

    3.79755e+04    1.78943e+05    1.22798e+05    2.83835e+03   -9.19303e+02

          LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)   Coul. recip.

    2.56547e+04    5.11714e+05    9.77218e+03   -2.07148e+06    8.64504e+03

      Potential    Kinetic En.   Total Energy  Conserved En.    Temperature

    7.64126e+13    4.79398e+05    7.64126e+13    7.64126e+13    3.58009e+02

 Pressure (bar)   Constr. rmsd

   -6.03201e+01    4.56399e-06





-------------------------------------------------------

Program:     gmx mdrun, version 2019.4

Source file: src/gromacs/gpu_utils/cudautils.cuh (line 229)

MPI rank:    2 (out of 8)



Fatal error:

cudaStreamSynchronize failed: an illegal memory access was encountered



For more information and tips for troubleshooting, please check the GROMACS

website at http://www.gromacs.org/Documentation/Errors

-------------------------------------------------------
```

Here is the command and the driver info:

```
Command line:

  gmx mdrun -v -s md_seed_fixed.tpr -deffnm md_seed_fixed -ntmpi 8 -pin on
-nb gpu -ntomp 3 -pme gpu -pmefft gpu -notunepme -npme 1 -gputasks 00112233
-maxh 2 -cpt 60 -cpi -noappend


GROMACS version:    2019.4

Precision:          single

Memory model:       64 bit

MPI library:        thread_mpi

OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)

GPU support:        CUDA

SIMD instructions:  AVX2_256

FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512

RDTSCP usage:       enabled

TNG support:        enabled

Hwloc support:      hwloc-1.11.2

Tracing support:    disabled

C compiler:         /packages/7x/gcc/gcc-7.3.0/bin/gcc GNU 7.3.0

C compiler flags:    -mavx2 -mfma     -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:       /packages/7x/gcc/gcc-7.3.0/bin/g++ GNU 7.3.0

C++ compiler flags:  -mavx2 -mfma    -std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /packages/7x/cuda/9.2.88.1/bin/nvcc nvcc: NVIDIA (R)
Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
Wed_Apr_11_23:16:29_CDT_2018;Cuda compilation tools, release 9.2, V9.2.88
CUDA compiler
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;;
;-mavx2;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        9.20

CUDA runtime:       9.20





Running on 1 node with total 24 cores, 24 logical cores, 4 compatible GPUs

Hardware detected:

  CPU info:

    Vendor: Intel

    Brand:  Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz

    Family: 6   Model: 79   Stepping: 1

    Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt intel
lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
  Hardware topology: Full, with devices

    Sockets, cores, and logical processors:

      Socket  0: [   0] [   1] [   2] [   3] [   4] [   5] [   6] [   7] [
  8] [   9] [  10] [  11]
      Socket  1: [  12] [  13] [  14] [  15] [  16] [  17] [  18] [  19] [
 20] [  21] [  22] [  23]
    Numa nodes:

      Node  0 (34229563392 bytes mem):   0   1   2   3   4   5   6   7   8
  9  10  11
      Node  1 (34359738368 bytes mem):  12  13  14  15  16  17  18  19  20
 21  22  23
      Latency:

               0     1

         0  1.00  2.10

         1  2.10  1.00

    Caches:

      L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 1 ways

      L2: 262144 bytes, linesize 64 bytes, assoc. 8, shared 1 ways

      L3: 31457280 bytes, linesize 64 bytes, assoc. 20, shared 12 ways
         PCI devices:

      0000:01:00.0  Id: 15b3:1007  Class: 0x0200  Numa: 0

      0000:02:00.0  Id: 10de:1b06  Class: 0x0300  Numa: 0

      0000:03:00.0  Id: 10de:1b06  Class: 0x0300  Numa: 0

      0000:00:11.4  Id: 8086:8d62  Class: 0x0106  Numa: 0

      0000:06:00.0  Id: 1a03:2000  Class: 0x0300  Numa: 0

      0000:00:1f.2  Id: 8086:8d02  Class: 0x0106  Numa: 0

      0000:81:00.0  Id: 8086:1521  Class: 0x0200  Numa: 1

      0000:81:00.1  Id: 8086:1521  Class: 0x0200  Numa: 1

      0000:82:00.0  Id: 15b3:1007  Class: 0x0280  Numa: 1

      0000:83:00.0  Id: 10de:1b06  Class: 0x0300  Numa: 1

      0000:84:00.0  Id: 10de:1b06  Class: 0x0300  Numa: 1

  GPU info:

    Number of GPUs detected: 4

    #0: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible
    #1: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible
    #2: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible
    #3: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible
```

Note that the simulation ran for about 2.8ns and we got a weird high
potential energy at the end of it.

On Mon, Dec 2, 2019 at 2:13 PM Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> Hi,
>
> What driver version is reported in the respective log files? Does the error
> persist if mdrun -notunepme is used?
>
> Mark
>
> On Mon., 2 Dec. 2019, 21:18 Chenou Zhang, <czhan178 at asu.edu> wrote:
>
> > Hi Gromacs developers,
> >
> > I'm currently running gromacs 2019.4 on our university's HPC cluster. To
> > fully utilize the GPU nodes, I followed notes on
> >
> >
> http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html
> > .
> >
> >
> > And here is the command I used for my runs.
> > ```
> > gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb gpu
> -ntomp
> > 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS -cpt 60
> -cpi
> > -noappend
> > ```
> >
> > And for some of those runs, they might fail with the following error:
> > ```
> > -------------------------------------------------------
> >
> > Program:     gmx mdrun, version 2019.4
> >
> > Source file: src/gromacs/gpu_utils/cudautils.cuh (line 229)
> >
> > MPI rank:    3 (out of 8)
> >
> >
> >
> > Fatal error:
> >
> > cudaStreamSynchronize failed: an illegal memory access was encountered
> >
> >
> >
> > For more information and tips for troubleshooting, please check the
> GROMACS
> >
> > website at http://www.gromacs.org/Documentation/Errors
> > ```
> >
> > we also had a different error from slurm system:
> > ```
> > ^Mstep 4400: timed with pme grid 96 96 60, coulomb cutoff 1.446: 467.9
> > M-cycles
> > ^Mstep 4600: timed with pme grid 96 96 64, coulomb cutoff 1.372: 451.4
> > M-cycles
> > /var/spool/slurmd/job2321134/slurm_script: line 44: 29866 Segmentation
> > fault      gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on
> -nb
> > gpu -ntomp 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS
> > -cpt 60 -cpi -noappend
> > ```
> >
> > We first thought this could due to compiler issue and tried different
> > settings as following:
> > ===test1===
> > <source>
> > module load cuda/9.2.88.1
> > module load gcc/7.3.0
> > . /home/rsexton2/Library/gromacs/2019.4/test1/bin/GMXRC
> > </source>
> > ===test2===
> > <source>
> > module load cuda/9.2.88.1
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test2/bin/GMXRC
> > </source>
> > ===test3===
> > <source>
> > module load cuda/9.2.148
> > module load gcc/7.3.0
> > . /home/rsexton2/Library/gromacs/2019.4/test3/bin/GMXRC
> > </source>
> > ===test4===
> > <source>
> > module load cuda/9.2.148
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test4/bin/GMXRC
> > </source>
> > ===test5===
> > <source>
> > module load cuda/9.1.85
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test5/bin/GMXRC
> > </source>
> > ===test6===
> > <source>
> > module load cuda/9.0.176
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test6/bin/GMXRC
> > </source>
> > ===test7===
> > <source>
> > module load cuda/9.2.88.1
> > module load gccgpu/7.4.0
> > . /home/rsexton2/Library/gromacs/2019.4/test7/bin/GMXRC
> > </source>
> >
> > However we still ended up with the same errors showed above. Does anyone
> > know where does the cudaStreamSynchronize come in? Or am I wrongly using
> > those gmx gpu commands?
> >
> > Any input will be appreciated!
> >
> > Thanks!
> > Chenou
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list