[gmx-users] Gromacs 2019.4 - cudaStreamSynchronize failed issue
Chenou Zhang
czhan178 at asu.edu
Mon Dec 2 21:17:46 CET 2019
Hi Gromacs developers,
I'm currently running gromacs 2019.4 on our university's HPC cluster. To
fully utilize the GPU nodes, I followed notes on
http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html.
And here is the command I used for my runs.
```
gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb gpu -ntomp
3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS -cpt 60 -cpi
-noappend
```
And for some of those runs, they might fail with the following error:
```
-------------------------------------------------------
Program: gmx mdrun, version 2019.4
Source file: src/gromacs/gpu_utils/cudautils.cuh (line 229)
MPI rank: 3 (out of 8)
Fatal error:
cudaStreamSynchronize failed: an illegal memory access was encountered
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
```
we also had a different error from slurm system:
```
^Mstep 4400: timed with pme grid 96 96 60, coulomb cutoff 1.446: 467.9
M-cycles
^Mstep 4600: timed with pme grid 96 96 64, coulomb cutoff 1.372: 451.4
M-cycles
/var/spool/slurmd/job2321134/slurm_script: line 44: 29866 Segmentation
fault gmx mdrun -v -s $TPR -deffnm md_seed_fixed -ntmpi 8 -pin on -nb
gpu -ntomp 3 -pme gpu -pmefft gpu -npme 1 -gputasks 00112233 -maxh $HOURS
-cpt 60 -cpi -noappend
```
We first thought this could due to compiler issue and tried different
settings as following:
===test1===
<source>
module load cuda/9.2.88.1
module load gcc/7.3.0
. /home/rsexton2/Library/gromacs/2019.4/test1/bin/GMXRC
</source>
===test2===
<source>
module load cuda/9.2.88.1
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test2/bin/GMXRC
</source>
===test3===
<source>
module load cuda/9.2.148
module load gcc/7.3.0
. /home/rsexton2/Library/gromacs/2019.4/test3/bin/GMXRC
</source>
===test4===
<source>
module load cuda/9.2.148
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test4/bin/GMXRC
</source>
===test5===
<source>
module load cuda/9.1.85
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test5/bin/GMXRC
</source>
===test6===
<source>
module load cuda/9.0.176
module load gcc/6x
. /home/rsexton2/Library/gromacs/2019.4/test6/bin/GMXRC
</source>
===test7===
<source>
module load cuda/9.2.88.1
module load gccgpu/7.4.0
. /home/rsexton2/Library/gromacs/2019.4/test7/bin/GMXRC
</source>
However we still ended up with the same errors showed above. Does anyone
know where does the cudaStreamSynchronize come in? Or am I wrongly using
those gmx gpu commands?
Any input will be appreciated!
Thanks!
Chenou
More information about the gromacs.org_gmx-users
mailing list