[gmx-users] segfault on Gromacs 4.6.3 (cuda)
Guanglei Cui
amber.mail.archive at gmail.com
Wed Sep 18 15:54:46 CEST 2013
Thanks very much, Szilárd.
Our IT just found out we purchased the latest Intel compiler, but
apparently was never installed. Now, I can check if this happens with the
new compiler. I may or may not follow up with a bug report if I can't
reproduce the behavior.
Regards,
On Sun, Sep 15, 2013 at 1:39 PM, Szilárd Páll <szilard.pall at cbr.su.se>wrote:
> Hi,
>
> On Tue, Sep 10, 2013 at 2:03 AM, Guanglei Cui
> <amber.mail.archive at gmail.com> wrote:
> > Hi Szilard,
> >
> > Thanks again for getting back. You may remember the previous thread I
> > started on regression test failure with icc 11.x compiled binary. Falling
> As it was not referenced, I did not recall your previous mail at the
> time of writing my above reply.
>
> > back to SSE2 is my solution, and binaries compiled this way are able to
> > pass all regression tests, including the one with GPU switched on.
> However,
> > it is not clear to me if the GPU part is specifically tested in the
> > regression.
>
> In the regression test runs mdrun uses automated selection of CPU or
> GPU - the same way as it would happen if you were doing a standalone
> run. Your question reminds me that we should probably extend this
> behaviour so that when a GPU is present not only the GPU Verlet scheme
> kernels will be used in the testing.
>
> Therefore, my I advise is that regression tests on machines with a GPU
> and a GPU-enable builds should, for now, be done in two passes:
> - tests using GPU Verlet kernels: make check;
> - tests using CPU Verlet kernels: CUDA_VISIBLE_DEVICES="" make check
> (or use GMX_DISABLE_GPU_DETECTION in case of detection issues)
>
> >
> > As I was trying to explain in the original email, the binary works fine
> on
> > a node with proper graphics driver, but crashes on a node where the
> > graphics driver is older than the CUDA SDK used in compilation. I think
> > updating the driver may potentially enable the GPU part. Pure CPU
>
> I understood, that's what my comment regarding the less than graceful
> handling of some GPU detection cases. We'll improve this behaviour in
> one of the upcoming versions.
>
> > calculation with the same binary seems not working. It is not clear to me
> > if this is caused by the compiler. It's not really simple to update the
> gcc
> > to 4.7 or greater since we use CentOS 5.x in the company. Even CentOS 6.x
> > uses gcc 4.4.x as default.
> >
> > I've just tested the code with -nb cpu. It still crashes. The binary
>
> Have you tried setting the aforementioned environment variable,
> GMX_DISABLE_GPU_DETECTION?
>
> > compiled without GPU works as expected and passed all regression tests.
> For
> > now, I can keep separate binaries for GPU and CPU applications before I
> can
> > get gcc 4.7 or greater installed.
>
> Have you built the correctly functioning mdrun without GPU support on
> the same machine with the same compiler and libraries as your
> problematic GPU-enabled builds? While performance-wise it is far from
> the best choice, AFAIK gcc 4.4 should work OK - at least in our
> automated tests it does.
> Hence, the fact that you are using gcc 4.4 should not result in a
> crash when switching to a CPU-only run.
>
> I would appreciate if you could open an issue on redmine.gromacs.org,
> describe the behaviour you are seeing, and provide as many of the
> following information as possible:
> - log files produced with the crashing binary;
> - result of running with GMX_DISABLE_GPU_DETECTION;
> - a backtrace of the crash (build with
> CMAKE_BUILD_TYPE=RelWithDebInfo, run in gdb, type "bt" after the crash
> occurs and provide the output) and/or
> - run with mdrun -debug 1 and provide the mdrun.debug output.
>
> With the above information we should be able to judge what is causing
> the problem.
>
> Cheers,
> --
> Szilárd
>
> > Best regards,
> > Guanglei
> >
> >
> > On Mon, Sep 9, 2013 at 4:35 PM, Szilárd Páll <szilard.pall at cbr.su.se>
> wrote:
> >
> >> HI,
> >>
> >> First of all, icc 11 is not well tested and there have been reports
> >> about it compiling broken code. This could explain the crash, but
> >> you'd need to do a bit more testing to confirm. Regading the GPU
> >> detection error, if you use a driver which is incompatible with the
> >> CUDA runtime (at least as high API version, see the mdrun log header's
> >> last two lines) and at the moment, some of such cases are not detected
> >> particularly gracefully.
> >>
> >> A few things to try:
> >> - use gcc, 4.7 is as fast or faster than any icc;
> >> - run with the "-nb cpu" option; does it still crash?
> >> - run with GPU detection completely disabled*
> >> - run the regressiontests; try using CPUs only*
> >>
> >> *You can set the GMX_DISABLE_GPU_DETECTION environment variable to
> >> completely disable the GPU detection.
> >>
> >> Cheers,
> >> --
> >> Szilárd
> >>
> >>
> >> On Mon, Sep 9, 2013 at 9:52 PM, Guanglei Cui
> >> <amber.mail.archive at gmail.com> wrote:
> >> > Dear GMX users,
> >> >
> >> > I recently compiled Gromacs 4.6.3 with CUDA (Intel compiler 11.x,
> SSE2,
> >> and
> >> > CUDA SDK 5.0.35). I was doing a test run with simply 'mdrun -deffnm
> >> > eq2_npt_verlet' (letting mdrun figure out what to use). I received the
> >> > error telling me my graphics driver was older than the CUDA SDK, and
> >> > regular CPU code would be used instead. Then, it crashed with
> >> Segmentation
> >> > Fault. The code runs properly on another node where the graphics
> driver
> >> is
> >> > more up to date. I wonder if the crashing is somewhat expected, and
> >> > therefore I should prepare different binaries based on the
> capabilities
> >> of
> >> > different nodes. Thanks.
> >> >
> >> > Best regards,
> >> > --
> >> > Guanglei Cui
> >> > --
> >> > gmx-users mailing list gmx-users at gromacs.org
> >> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> > * Please don't post (un)subscribe requests to the list. Use the
> >> > www interface or send it to gmx-users-request at gromacs.org.
> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> --
> >> gmx-users mailing list gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> * Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >
> >
> >
> > --
> > Guanglei Cui
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
--
Guanglei Cui
More information about the gromacs.org_gmx-users
mailing list