[gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

Mark Abraham mark.j.abraham at gmail.com
Mon Dec 2 22:13:42 CET 2019


Hi,

What driver version is reported in the respective log files? Does the error
persist if mdrun -notunepme is used?

Mark

On Mon., 2 Dec. 2019, 21:18 Chenou Zhang, <czhan178 at asu.edu> wrote:

> Hi Gromacs developers,
>
> I'm currently running gromacs 2019.4 on our university's HPC cluster. To
> fully utilize the GPU nodes, I followed notes on
>
> http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html
> .
>
>
> And here is the command I used for my runs.
> ```
> gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb gpu -ntomp
> 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS -cpt 60 -cpi
> -noappend
> ```
>
> And for some of those runs, they might fail with the following error:
> ```
> -------------------------------------------------------
>
> Program:     gmx mdrun, version 2019.4
>
> Source file: src/gromacs/gpu_utils/cudautils.cuh (line 229)
>
> MPI rank:    3 (out of 8)
>
>
>
> Fatal error:
>
> cudaStreamSynchronize failed: an illegal memory access was encountered
>
>
>
> For more information and tips for troubleshooting, please check the GROMACS
>
> website at http://www.gromacs.org/Documentation/Errors
> ```
>
> we also had a different error from slurm system:
> ```
> ^Mstep 4400: timed with pme grid 96 96 60, coulomb cutoff 1.446: 467.9
> M-cycles
> ^Mstep 4600: timed with pme grid 96 96 64, coulomb cutoff 1.372: 451.4
> M-cycles
> /var/spool/slurmd/job2321134/slurm_script: line 44: 29866 Segmentation
> fault      gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb
> gpu -ntomp 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS
> -cpt 60 -cpi -noappend
> ```
>
> We first thought this could due to compiler issue and tried different
> settings as following:
> ===test1===
> <source>
> module load cuda/9.2.88.1
> module load gcc/7.3.0
> . /home/rsexton2/Library/gromacs/2019.4/test1/bin/GMXRC
> </source>
> ===test2===
> <source>
> module load cuda/9.2.88.1
> module load gcc/6x
> . /home/rsexton2/Library/gromacs/2019.4/test2/bin/GMXRC
> </source>
> ===test3===
> <source>
> module load cuda/9.2.148
> module load gcc/7.3.0
> . /home/rsexton2/Library/gromacs/2019.4/test3/bin/GMXRC
> </source>
> ===test4===
> <source>
> module load cuda/9.2.148
> module load gcc/6x
> . /home/rsexton2/Library/gromacs/2019.4/test4/bin/GMXRC
> </source>
> ===test5===
> <source>
> module load cuda/9.1.85
> module load gcc/6x
> . /home/rsexton2/Library/gromacs/2019.4/test5/bin/GMXRC
> </source>
> ===test6===
> <source>
> module load cuda/9.0.176
> module load gcc/6x
> . /home/rsexton2/Library/gromacs/2019.4/test6/bin/GMXRC
> </source>
> ===test7===
> <source>
> module load cuda/9.2.88.1
> module load gccgpu/7.4.0
> . /home/rsexton2/Library/gromacs/2019.4/test7/bin/GMXRC
> </source>
>
> However we still ended up with the same errors showed above. Does anyone
> know where does the cudaStreamSynchronize come in? Or am I wrongly using
> those gmx gpu commands?
>
> Any input will be appreciated!
>
> Thanks!
> Chenou
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list