[gmx-developers] impending changes to Jenkins verification

Mark Abraham mark.j.abraham at gmail.com
Sun Sep 6 21:58:43 CEST 2015


Also,  bs_nix1204 is misbehaving. Stefan hopes to try swapping around some
GPUs in case that is the issue.


On Sun, 6 Sep 2015 19:46 Teemu Murtola <teemu.murtola at gmail.com> wrote:

> Hi,
> The situation isn't as simple as that:
>    - There certainly are changes that fail because they have not been
>    rebased, but those are easy to identify from the error messages that get
>    posted back to Gerrit and say that the required build scripts cannot be
>    found.
>    - There also are a few changes that fail on purpose, since they are
>    waiting for the C++11 change to move forward, and require some stuff from
>    there (and the new matrix anyways would also fail with the C++11 change).
>    - Additionally, many changes that have been rebased also have one or a
>    few failing builds. Persistently retriggering the failing builds leads to
>    the builds eventually succeeding, but this is not a sustainable situation.
>    The situation isn't exactly new (we've had random failures earlier as well,
>    with similar symptoms), but the volume of these is now much higher. This
>    really looks like that either Jenkins itself (the software, or the
>    hardware/virtualization layer it runs on) is flaky, or that our
>    configuration creates way too much peak load somewhere, causing something
>    to time out or otherwise fail (since typically the error messages are of
>    the sort "Channel is already closed").
>    - On top of that, there likely is at least one deadlock hiding
>    somewhere in mdrun, since occasionally the mdrun integration tests and/or
>    regression tests may hang for 15 minutes (before Jenkins kills them). It's
>    just a hunch that this deadlock also triggers more easily when there is
>    more load on the system.
> The new (temporary) setup with double jobs on nearly everything puts more
> load on Jenkins, so it might just have tipped the balance beyond a point
> where things were working ~OK (at least, when you didn't upload too many
> changes at the same time). But it would be nice to iron out these issues
> now, instead of just ignoring it in hopes that it goes back to manageable
> levels when we reduce the load.
> Just my two cents,
> Teemu
> On Sun, Sep 6, 2015 at 8:20 PM Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>> It's submitted to master already, so just rebase to HEAD as/when you want.
>> Mark
> On Sun, 6 Sep 2015 15:25 David van der Spoel <spoel at xray.bmc.uu.se> wrote:
> On 02/09/15 23:12, Mark Abraham wrote:
>> > With 5.1 off the table, we're implementing some much-needed updates to
>>> > the way we handle Jenkins verification of GROMACS.
>>> >
>>> > Teemu's rewritten the scripts we use to implement the various kinds of
>>> > verification jobs, which will let us maintain and extend in much less
>>> > ad-hoc fashion. Some parts of those scripts will now live in the
>>> > source repository, so that they can change in step with code changes.
>>> > We've already submitted that script to master, so when you rebase
>>> > patches over 0ce920a017, Jenkins will be able to use its new toys.
>>> So is this the reason that most patches fail right now?
>>> Where is this patch in gerrit? I can not seem to find it...
>> --
> Gromacs Developers mailing list
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20150906/d33b30b9/attachment.html>

More information about the gromacs.org_gmx-developers mailing list