[gmx-users] Problem with Ryzen, threads and core dumps?
mark.j.abraham at gmail.com
Wed Aug 9 14:42:22 CEST 2017
On Wed, Aug 9, 2017 at 12:04 AM Steffen Graether <graether at uoguelph.ca>
> I am trying to run a protein/water simulation using a new computer (Ryzen
> 1700X, Nvidia GTX 1070 running Ubuntu 17.04). I compiled GROMACS 2016.3
> with gcc-5 (CUDA 8.0 complained that it only supports up to version 5) and
> -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON, otherwise defaults
> were used. The build passed all make check tests.
Great, that's good news for GROMACS, at least.
> During the production mdrun, I would get core dumps almost all of the time
> (mostly segmentation faults, sometimes illegal instructions, once it
> actually completed the 1 ps run). I also tried different water (TIP4P2005
> and SPCE), but no change. I haven’t tried a different forcefield yet (just
> Amber ff03WS), but feel like that isn’t the problem.
Yeah force field does not seem a likely cause for the problem. But do try
your tpr on a different machine.
> Suggestions I found online seem to be mostly about not having compiled gmx
> on the current machine, which I had done. I also tried recompiling with
> SSE4.1, but still got a core dump. Interestingly restricting the program to
> the number of physical cores (-nt 8) lets the run complete more often (3/4
> attempts), but still crash.
OK. That could be consistent with the IRETQ issue mentioned here
is probably a different problem from the one recently acknowledged by AMD
https://phoronix.com/scan.php?page=news_item&px=Ryzen-Segv-Response. I have
not seen anybody reporting issues with any non-compilation computational
workload, but there are a few issues around, and even the IRETQ issue could
be happening because an OS thread is interrupting a GROMACS one.
We have one 8-core AMD Ryzen 7 1700X, which did more than 1000 steps of the
first simulation I tried (no GPU, using -ntmpi 4 -ntomp 4, Ubuntu 16.04.2,
with either gcc 5.4 or gcc 4.8.5).
I would be appreciative of any suggestions people may have for compiling a
> more stable executable.
Unfortunately, it seems likely to be unrelated to GROMACS or its
compilation. I have nothing more to suggest than trying your tpr on another
machine, to rule out problems with your simulation setup, and if so, to
contact AMD customer care. There may be BIOS fixes that address problems
but I don't know.
> Gromacs Users mailing list
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users