[gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue

Chenou Zhang czhan178 at asu.edu
Mon Dec 2 21:17:46 CET 2019


Hi Gromacs developers,

I'm currently running gromacs 2019.4 on our university's HPC cluster. To
fully utilize the GPU nodes, I followed notes on
http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html.


And here is the command I used for my runs.
```
gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb gpu -ntomp
3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS -cpt 60 -cpi
-noappend
```

And for some of those runs, they might fail with the following error:
```
-------------------------------------------------------

Program:     gmx mdrun, version 2019.4

Source file: src/gromacs/gpu_utils/cudautils.cuh (line 229)

MPI rank:    3 (out of 8)



Fatal error:

cudaStreamSynchronize failed: an illegal memory access was encountered



For more information and tips for troubleshooting, please check the GROMACS

website at http://www.gromacs.org/Documentation/Errors
```

we also had a different error from slurm system:
```
^Mstep 4400: timed with pme grid 96 96 60, coulomb cutoff 1.446: 467.9
M-cycles
^Mstep 4600: timed with pme grid 96 96 64, coulomb cutoff 1.372: 451.4
M-cycles
/var/spool/slurmd/job2321134/slurm_script: line 44: 29866 Segmentation
fault      gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb
gpu -ntomp 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS
-cpt 60 -cpi -noappend
```

We first thought this could due to compiler issue and tried different
settings as following:
===test1===
<source>
module load cuda/9.2.88.1
module load gcc/7.3.0
. /home/rsexton2/Library/gromacs/2019.4/test1/bin/GMXRC
</source>
===test2===
<source>
module load cuda/9.2.88.1
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test2/bin/GMXRC
</source>
===test3===
<source>
module load cuda/9.2.148
module load gcc/7.3.0
. /home/rsexton2/Library/gromacs/2019.4/test3/bin/GMXRC
</source>
===test4===
<source>
module load cuda/9.2.148
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test4/bin/GMXRC
</source>
===test5===
<source>
module load cuda/9.1.85
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test5/bin/GMXRC
</source>
===test6===
<source>
module load cuda/9.0.176
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test6/bin/GMXRC
</source>
===test7===
<source>
module load cuda/9.2.88.1
module load gccgpu/7.4.0
. /home/rsexton2/Library/gromacs/2019.4/test7/bin/GMXRC
</source>

However we still ended up with the same errors showed above. Does anyone
know where does the cudaStreamSynchronize come in? Or am I wrongly using
those gmx gpu commands?

Any input will be appreciated!

Thanks!
Chenou


More information about the gromacs.org_gmx-users mailing list