[gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

Chenou Zhang czhan178 at asu.edu
Mon Dec 2 23:53:07 CET 2019


For the error:
```
^Mstep 4400: timed with pme grid 96 96 60, coulomb cutoff 1.446: 467.9
M-cycles
^Mstep 4600: timed with pme grid 96 96 64, coulomb cutoff 1.372: 451.4
M-cycles
/var/spool/slurmd/job2321134/slurm_script: line 44: 29866 Segmentation
fault      gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb
gpu -ntomp 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS
-cpt 60 -cpi -noappend
```
I got these driver info:
```
GROMACS:      gmx mdrun, version 2019.4

Executable:   /home/rsexton2/Library/gromacs/2019.4/test1/bin/gmx

Data prefix:  /home/rsexton2/Library/gromacs/2019.4/test1

Working dir:  /scratch/czhan178/project/NapA-2019.4/gromacs_test_1/test_9

Process ID:   29866

Command line:

  gmx mdrun -v -s md_seed_fixed.tpr -deffnm md_seed_fixed -ntmpi 8 -pin on
-nb gpu -ntomp 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh 2
-cpt 60 -cpi -noappend


GROMACS version:    2019.4

Precision:          single

Memory model:       64 bit

MPI library:        thread_mpi

OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)

GPU support:        CUDA

SIMD instructions:  AVX2_256

FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512

RDTSCP usage:       enabled

TNG support:        enabled

Hwloc support:      hwloc-1.11.2

Tracing support:    disabled

C compiler:         /packages/7x/gcc/gcc-7.3.0/bin/gcc GNU 7.3.0

C compiler flags:    -mavx2 -mfma     -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:       /packages/7x/gcc/gcc-7.3.0/bin/g++ GNU 7.3.0

C++ compiler flags:  -mavx2 -mfma    -std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /packages/7x/cuda/9.2.88.1/bin/nvcc nvcc: NVIDIA (R)
Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
Wed_Apr_11_23:16:29_CDT_2018;Cuda compilation tools, release 9.2, V9.2.88
CUDA compiler
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;;
;-mavx2;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        9.20

CUDA runtime:       9.20
```

I'll run -notunepme option and get you updated.

Chenou

On Mon, Dec 2, 2019 at 2:13 PM Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> Hi,
>
> What driver version is reported in the respective log files? Does the error
> persist if mdrun -notunepme is used?
>
> Mark
>
> On Mon., 2 Dec. 2019, 21:18 Chenou Zhang, <czhan178 at asu.edu> wrote:
>
> > Hi Gromacs developers,
> >
> > I'm currently running gromacs 2019.4 on our university's HPC cluster. To
> > fully utilize the GPU nodes, I followed notes on
> >
> >
> http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html
> > .
> >
> >
> > And here is the command I used for my runs.
> > ```
> > gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb gpu
> -ntomp
> > 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS -cpt 60
> -cpi
> > -noappend
> > ```
> >
> > And for some of those runs, they might fail with the following error:
> > ```
> > -------------------------------------------------------
> >
> > Program:     gmx mdrun, version 2019.4
> >
> > Source file: src/gromacs/gpu_utils/cudautils.cuh (line 229)
> >
> > MPI rank:    3 (out of 8)
> >
> >
> >
> > Fatal error:
> >
> > cudaStreamSynchronize failed: an illegal memory access was encountered
> >
> >
> >
> > For more information and tips for troubleshooting, please check the
> GROMACS
> >
> > website at http://www.gromacs.org/Documentation/Errors
> > ```
> >
> > we also had a different error from slurm system:
> > ```
> > ^Mstep 4400: timed with pme grid 96 96 60, coulomb cutoff 1.446: 467.9
> > M-cycles
> > ^Mstep 4600: timed with pme grid 96 96 64, coulomb cutoff 1.372: 451.4
> > M-cycles
> > /var/spool/slurmd/job2321134/slurm_script: line 44: 29866 Segmentation
> > fault      gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on
> -nb
> > gpu -ntomp 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS
> > -cpt 60 -cpi -noappend
> > ```
> >
> > We first thought this could due to compiler issue and tried different
> > settings as following:
> > ===test1===
> > <source>
> > module load cuda/9.2.88.1
> > module load gcc/7.3.0
> > . /home/rsexton2/Library/gromacs/2019.4/test1/bin/GMXRC
> > </source>
> > ===test2===
> > <source>
> > module load cuda/9.2.88.1
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test2/bin/GMXRC
> > </source>
> > ===test3===
> > <source>
> > module load cuda/9.2.148
> > module load gcc/7.3.0
> > . /home/rsexton2/Library/gromacs/2019.4/test3/bin/GMXRC
> > </source>
> > ===test4===
> > <source>
> > module load cuda/9.2.148
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test4/bin/GMXRC
> > </source>
> > ===test5===
> > <source>
> > module load cuda/9.1.85
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test5/bin/GMXRC
> > </source>
> > ===test6===
> > <source>
> > module load cuda/9.0.176
> > module load gcc/6x
> > . /home/rsexton2/Library/gromacs/2019.4/test6/bin/GMXRC
> > </source>
> > ===test7===
> > <source>
> > module load cuda/9.2.88.1
> > module load gccgpu/7.4.0
> > . /home/rsexton2/Library/gromacs/2019.4/test7/bin/GMXRC
> > </source>
> >
> > However we still ended up with the same errors showed above. Does anyone
> > know where does the cudaStreamSynchronize come in? Or am I wrongly using
> > those gmx gpu commands?
> >
> > Any input will be appreciated!
> >
> > Thanks!
> > Chenou
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list