[gmx-users] GMX 2018 regression tests: cufftPlanMany R2C plan failure (error code 5)

Mark Abraham mark.j.abraham at gmail.com
Thu Feb 8 14:40:36 CET 2018


Hi,

On Thu, Feb 8, 2018 at 2:15 PM Alex <nedomacho at gmail.com> wrote:

> Mark and Peter,
>
> Thanks for commenting. I was told that all CUDA tests passed, but I will
> double check on how many of those were actually run. Also, we never
> rebooted the box after CUDA install, and finally we had a bunch of
> gromacs (2016.4) jobs running, because we didn't want to interrupt
> postdoc's work... All of those were with -nb cpu though. Could those
> factors have affected our regression tests?
>

Can't say. You observed timeouts, which could be consistent with drivers or
runtimes getting stuck. However, the other mdrun processes may have by
default set thread affinity, and any process that does that will interfere
with how effectively any others run, such as the tests. Sharing a node is
difficult to do well, and doing anything else with a node running GROMACS
is asking for trouble unless you have manually managed keeping the tasks
apart. Just don't.

Mark


> It will really suck, if these are hardware-related...
>
> Thanks,
>
> Alex
>
>
> On 2/8/2018 3:03 AM, Mark Abraham wrote:
> > Hi,
> >
> > Or leftovers of the drivers that are now mismatching. That has caused
> > timeouts for us.
> >
> > Mark
> >
> > On Thu, Feb 8, 2018 at 10:55 AM Peter Kroon <p.c.kroon at rug.nl> wrote:
> >
> >> Hi,
> >>
> >>
> >> with changing failures like this I would start to suspect the hardware
> >> as well. Mark's suggestion of looking at simpler test programs than GMX
> >> is a good one :)
> >>
> >>
> >> Peter
> >>
> >>
> >> On 08-02-18 09 <08-02%2018%2009> <08-02%2018%2009>:10, Mark Abraham
> wrote:
> >>> Hi,
> >>>
> >>> That suggests that your new CUDA installation is differently
> incomplete.
> >> Do
> >>> its samples or test programs run?
> >>>
> >>> Mark
> >>>
> >>> On Thu, Feb 8, 2018 at 1:20 AM Alex <nedomacho at gmail.com> wrote:
> >>>
> >>>> Update: we seem to have had a hiccup with an orphan CUDA install and
> >> that
> >>>> was causing issues. After wiping everything off and rebuilding the
> >> errors
> >>>> from the initial post disappeared. However, two tests failed during
> >>>> regression:
> >>>>
> >>>> 95% tests passed, 2 tests failed out of 39
> >>>>
> >>>> Label Time Summary:
> >>>> GTest              = 170.83 sec (33 tests)
> >>>> IntegrationTest    = 125.00 sec (3 tests)
> >>>> MpiTest            =   4.90 sec (3 tests)
> >>>> UnitTest           =  45.83 sec (30 tests)
> >>>>
> >>>> Total Test time (real) = 1225.65 sec
> >>>>
> >>>> The following tests FAILED:
> >>>>    9 - GpuUtilsUnitTests (Timeout)
> >>>> 32 - MdrunTests (Timeout)
> >>>> Errors while running CTest
> >>>> CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target
> >>>> 'CMakeFiles/run-ctest-nophys' failed
> >>>> make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
> >>>> CMakeFiles/Makefile2:1160: recipe for target
> >>>> 'CMakeFiles/run-ctest-nophys.dir/all' failed
> >>>> make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
> >>>> CMakeFiles/Makefile2:971: recipe for target
> 'CMakeFiles/check.dir/rule'
> >>>> failed
> >>>> make[1]: *** [CMakeFiles/check.dir/rule] Error 2
> >>>> Makefile:546: recipe for target 'check' failed
> >>>> make: *** [check] Error 2
> >>>>
> >>>> Any ideas? I can post the complete log, if needed.
> >>>>
> >>>> Thank you,
> >>>>
> >>>> Alex
> >>>> --
> >>>> Gromacs Users mailing list
> >>>>
> >>>> * Please search the archive at
> >>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>>> posting!
> >>>>
> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>
> >>>> * For (un)subscribe requests visit
> >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>>> send a mail to gmx-users-request at gromacs.org.
> >>>>
> >>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list