[gmx-developers] GROMACS OpenCL on Gallium

Mark Abraham mark.j.abraham at gmail.com
Tue Dec 8 04:27:27 CET 2015


Hi,

I've uploaded a patch that addresses a couple of the issues - the
regressiontests are fine like I said - I think the segfaults are indeed
coming from a broken version of CUDA (have updated the opencl test config
to try 6.5). Agree we should probably bump the minimum version of CUDA for
OpenCL and avoid trouble.

The empty-domain test (that I added to cover a hard-to-reproduce bug in our
GPU stream handling) requires two ranks. I used to hard-code this in the
CUDA days, which was OK then but not now with OpenCL needed in Jenkins, so
my patch tries to rely better on the new automated resource assignment, but
Jenkins can be the judge of that. I think we were also mis-managing the
OpenCL version of the code that waited for non-local events before starting
local events - that test case at least did its job (eventually).

Also added some error code strings that we might make more general use of
in future.

http://jenkins.gromacs.org/job/Gromacs_Gerrit_5_1-test-opencl-slave/15/
https://gerrit.gromacs.org/#/c/5430/

Mark

On Tue, Dec 8, 2015 at 4:36 AM Szilárd Páll <pall.szilard at gmail.com> wrote:

> Hi,
>
> All three segfaults produce a backtrace similar to this:
>
> [...]
> #2  0x00007fcf53618632 in ?? () from
> /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
> #3  0x00007fcf58b6fdca in sync_ocl_event (stream=0x7fcf4c820160,
> ocl_event=0x7fcf4c031380)
>     at
> /mnt/workspace/Gromacs_Gerrit_5_1-test-opencl-slave/d27c5006/gromacs/src/gromacs/mdlib/nbnxn_ocl/nbnxn_ocl.cpp:331
> #4  0x00007fcf58b70f7d in nbnxn_gpu_launch_cpyback (nb=0x7fcf4c030f40,
> nbatom=0x7fcf4c022ca0,
>     flags=1015, aloc=0)
>     at
> /mnt/workspace/Gromacs_Gerrit_5_1-test-opencl-slave/d27c5006/gromacs/src/gromacs/mdlib/nbnxn_ocl/nbnxn_ocl.cpp:952
> #5  0x00007fcf58b65fcc in do_force_cutsVERLET ()
>     at
> /mnt/workspace/Gromacs_Gerrit_5_1-test-opencl-slave/d27c5006/gromacs/src/gromacs/mdlib/sim_util.cpp:1061
> #6  0x00007fcf58b68e02 in do_force ()
>     at
> /mnt/workspace/Gromacs_Gerrit_5_1-test-opencl-slave/d27c5006/gromacs/src/gromacs/mdlib/sim_util.cpp:2009
> #7  0x000000000041ac0e in do_md ()
>     at
> /mnt/workspace/Gromacs_Gerrit_5_1-test-opencl-slave/d27c5006/gromacs/src/programs/mdrun/md.cpp:1078
> #8  0x000000000042835b in mdrunner ()
>     at
> /mnt/workspace/Gromacs_Gerrit_5_1-test-opencl-slave/d27c5006/gromacs/src/programs/mdrun/runner.cpp:1282
> #9  0x000000000042528e in mdrunner_start_fn (arg=0xb8ddd0)
>     at
> /mnt/workspace/Gromacs_Gerrit_5_1-test-opencl-slave/d27c5006/gromacs/src/programs/mdrun/runner.cpp:186
> [...]
>
> This could be due to an old CUDA being used. I'll check that, but in any
> case, especially for NVIDIA OpenCL that we know it's been buggy (and as far
> as I know still is), we probably really should not use anything older than
> 7.0 or 7.5.
>
> The other failures on the AMD test machine seem to be caused by the tests
> being called in an incompatible way, although I have the feeling that
> something is off with that too (because tMPI+OpenCL multi-GPU should work,
> I though).
>
> --
> Szilárd
>
> On Mon, Dec 7, 2015 at 2:46 PM, Vedran Miletić <rivanvx at gmail.com> wrote:
>
>> Szilard, Mark,
>>
>> thanks for looking into this.
>>
>> 2015-12-07 14:29 GMT+01:00 Szilárd Páll <pall.szilard at gmail.com>:
>> > http://jenkins.gromacs.org/job/Gromacs_Gerrit_5_1-test-opencl-slave/14
>>
>> Didn't know we had that one. Very nice.
>>
>> Regards,
>> Vedran
>>
>> --
>> Vedran Miletić
>> http://vedranmileti.ch/
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org.
>>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20151208/f37719a1/attachment.html>


More information about the gromacs.org_gmx-developers mailing list