[gmx-users] segfault on Gromacs 4.6.3 (cuda)

Guanglei Cui amber.mail.archive at gmail.com
Tue Sep 10 02:03:11 CEST 2013


Hi Szilard,

Thanks again for getting back. You may remember the previous thread I
started on regression test failure with icc 11.x compiled binary. Falling
back to SSE2 is my solution, and binaries compiled this way are able to
pass all regression tests, including the one with GPU switched on. However,
it is not clear to me if the GPU part is specifically tested in the
regression.

As I was trying to explain in the original email, the binary works fine on
a node with proper graphics driver, but crashes on a node where the
graphics driver is older than the CUDA SDK used in compilation. I think
updating the driver may potentially enable the GPU part. Pure CPU
calculation with the same binary seems not working. It is not clear to me
if this is caused by the compiler. It's not really simple to update the gcc
to 4.7 or greater since we use CentOS 5.x in the company. Even CentOS 6.x
uses gcc 4.4.x as default.

I've just tested the code with -nb cpu. It still crashes. The binary
compiled without GPU works as expected and passed all regression tests. For
now, I can keep separate binaries for GPU and CPU applications before I can
get gcc 4.7 or greater installed.

Best regards,
Guanglei


On Mon, Sep 9, 2013 at 4:35 PM, Szilárd Páll <szilard.pall at cbr.su.se> wrote:

> HI,
>
> First of all, icc 11 is not well tested and there have been reports
> about it compiling broken code. This could explain the crash, but
> you'd need to do a bit more testing to confirm. Regading the GPU
> detection error, if you use a driver which is incompatible with the
> CUDA runtime (at least as high API version, see the mdrun log header's
> last two lines) and at the moment, some of such cases are not detected
> particularly gracefully.
>
> A few things to try:
> - use gcc, 4.7 is as fast or faster than any icc;
> - run with the "-nb cpu" option; does it still crash?
> - run with GPU detection completely disabled*
> - run the regressiontests; try using CPUs only*
>
> *You can set the GMX_DISABLE_GPU_DETECTION environment variable to
> completely disable the GPU detection.
>
> Cheers,
> --
> Szilárd
>
>
> On Mon, Sep 9, 2013 at 9:52 PM, Guanglei Cui
> <amber.mail.archive at gmail.com> wrote:
> > Dear GMX users,
> >
> > I recently compiled Gromacs 4.6.3 with CUDA (Intel compiler 11.x, SSE2,
> and
> > CUDA SDK 5.0.35). I was doing a test run with simply 'mdrun -deffnm
> > eq2_npt_verlet' (letting mdrun figure out what to use). I received the
> > error telling me my graphics driver was older than the CUDA SDK, and
> > regular CPU code would be used instead. Then, it crashed with
> Segmentation
> > Fault. The code runs properly on another node where the graphics driver
> is
> > more up to date. I wonder if the crashing is somewhat expected, and
> > therefore I should prepare different binaries based on  the capabilities
> of
> > different nodes. Thanks.
> >
> > Best regards,
> > --
> > Guanglei Cui
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



-- 
Guanglei Cui



More information about the gromacs.org_gmx-users mailing list