[gmx-users] segfault on Gromacs 4.6.3 (cuda)

Szilárd Páll szilard.pall at cbr.su.se
Sun Sep 15 19:39:37 CEST 2013


Hi,

On Tue, Sep 10, 2013 at 2:03 AM, Guanglei Cui
<amber.mail.archive at gmail.com> wrote:
> Hi Szilard,
>
> Thanks again for getting back. You may remember the previous thread I
> started on regression test failure with icc 11.x compiled binary. Falling
As it was not referenced, I did not recall your previous mail at the
time of writing my above reply.

> back to SSE2 is my solution, and binaries compiled this way are able to
> pass all regression tests, including the one with GPU switched on. However,
> it is not clear to me if the GPU part is specifically tested in the
> regression.

In the regression test runs mdrun uses automated selection of CPU or
GPU - the same way as it would happen if you were doing a standalone
run. Your question reminds me that we should probably extend this
behaviour so that when a GPU is present not only the GPU Verlet scheme
kernels will be used in the testing.

Therefore, my I advise is that regression tests on machines with a GPU
and a GPU-enable builds should, for now, be done in two passes:
- tests using GPU Verlet kernels: make check;
- tests using CPU Verlet kernels: CUDA_VISIBLE_DEVICES="" make check
(or use GMX_DISABLE_GPU_DETECTION in case of detection issues)

>
> As I was trying to explain in the original email, the binary works fine on
> a node with proper graphics driver, but crashes on a node where the
> graphics driver is older than the CUDA SDK used in compilation. I think
> updating the driver may potentially enable the GPU part. Pure CPU

I understood, that's what my comment regarding the less than graceful
handling of some GPU detection cases. We'll improve this behaviour in
one of the upcoming versions.

> calculation with the same binary seems not working. It is not clear to me
> if this is caused by the compiler. It's not really simple to update the gcc
> to 4.7 or greater since we use CentOS 5.x in the company. Even CentOS 6.x
> uses gcc 4.4.x as default.
>
> I've just tested the code with -nb cpu. It still crashes. The binary

Have you tried setting the aforementioned environment variable,
GMX_DISABLE_GPU_DETECTION?

> compiled without GPU works as expected and passed all regression tests.
For
> now, I can keep separate binaries for GPU and CPU applications before I can
> get gcc 4.7 or greater installed.

Have you built the correctly functioning mdrun without GPU support on
the same machine with the same compiler and libraries as your
problematic GPU-enabled builds? While performance-wise it is far from
the best choice, AFAIK gcc 4.4 should work OK - at least in our
automated tests it does.
Hence, the fact that you are using gcc 4.4 should not result in a
crash when switching to a CPU-only run.

I would appreciate if you could open an issue on redmine.gromacs.org,
describe the behaviour you are seeing, and provide as many of the
following information as possible:
- log files produced with the crashing binary;
- result of running with GMX_DISABLE_GPU_DETECTION;
- a backtrace of the crash (build with
CMAKE_BUILD_TYPE=RelWithDebInfo, run in gdb, type "bt" after the crash
occurs and provide the output) and/or
- run with mdrun -debug 1 and provide the mdrun.debug output.

With the above information we should be able to judge what is causing
the problem.

Cheers,
--
Szilárd

> Best regards,
> Guanglei
>
>
> On Mon, Sep 9, 2013 at 4:35 PM, Szilárd Páll <szilard.pall at cbr.su.se> wrote:
>
>> HI,
>>
>> First of all, icc 11 is not well tested and there have been reports
>> about it compiling broken code. This could explain the crash, but
>> you'd need to do a bit more testing to confirm. Regading the GPU
>> detection error, if you use a driver which is incompatible with the
>> CUDA runtime (at least as high API version, see the mdrun log header's
>> last two lines) and at the moment, some of such cases are not detected
>> particularly gracefully.
>>
>> A few things to try:
>> - use gcc, 4.7 is as fast or faster than any icc;
>> - run with the "-nb cpu" option; does it still crash?
>> - run with GPU detection completely disabled*
>> - run the regressiontests; try using CPUs only*
>>
>> *You can set the GMX_DISABLE_GPU_DETECTION environment variable to
>> completely disable the GPU detection.
>>
>> Cheers,
>> --
>> Szilárd
>>
>>
>> On Mon, Sep 9, 2013 at 9:52 PM, Guanglei Cui
>> <amber.mail.archive at gmail.com> wrote:
>> > Dear GMX users,
>> >
>> > I recently compiled Gromacs 4.6.3 with CUDA (Intel compiler 11.x, SSE2,
>> and
>> > CUDA SDK 5.0.35). I was doing a test run with simply 'mdrun -deffnm
>> > eq2_npt_verlet' (letting mdrun figure out what to use). I received the
>> > error telling me my graphics driver was older than the CUDA SDK, and
>> > regular CPU code would be used instead. Then, it crashed with
>> Segmentation
>> > Fault. The code runs properly on another node where the graphics driver
>> is
>> > more up to date. I wonder if the crashing is somewhat expected, and
>> > therefore I should prepare different binaries based on  the capabilities
>> of
>> > different nodes. Thanks.
>> >
>> > Best regards,
>> > --
>> > Guanglei Cui
>> > --
>> > gmx-users mailing list    gmx-users at gromacs.org
>> > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > * Please don't post (un)subscribe requests to the list. Use the
>> > www interface or send it to gmx-users-request at gromacs.org.
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>
>
>
> --
> Guanglei Cui
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list