[gmx-developers] impending changes to Jenkins verification

Teemu Murtola teemu.murtola at gmail.com
Sun Sep 6 19:46:34 CEST 2015


Hi,

The situation isn't as simple as that:

   - There certainly are changes that fail because they have not been
   rebased, but those are easy to identify from the error messages that get
   posted back to Gerrit and say that the required build scripts cannot be
   found.
   - There also are a few changes that fail on purpose, since they are
   waiting for the C++11 change to move forward, and require some stuff from
   there (and the new matrix anyways would also fail with the C++11 change).
   - Additionally, many changes that have been rebased also have one or a
   few failing builds. Persistently retriggering the failing builds leads to
   the builds eventually succeeding, but this is not a sustainable situation.
   The situation isn't exactly new (we've had random failures earlier as well,
   with similar symptoms), but the volume of these is now much higher. This
   really looks like that either Jenkins itself (the software, or the
   hardware/virtualization layer it runs on) is flaky, or that our
   configuration creates way too much peak load somewhere, causing something
   to time out or otherwise fail (since typically the error messages are of
   the sort "Channel is already closed").
   - On top of that, there likely is at least one deadlock hiding somewhere
   in mdrun, since occasionally the mdrun integration tests and/or regression
   tests may hang for 15 minutes (before Jenkins kills them). It's just a
   hunch that this deadlock also triggers more easily when there is more load
   on the system.

The new (temporary) setup with double jobs on nearly everything puts more
load on Jenkins, so it might just have tipped the balance beyond a point
where things were working ~OK (at least, when you didn't upload too many
changes at the same time). But it would be nice to iron out these issues
now, instead of just ignoring it in hopes that it goes back to manageable
levels when we reduce the load.

Just my two cents,
Teemu

On Sun, Sep 6, 2015 at 8:20 PM Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> It's submitted to master already, so just rebase to HEAD as/when you want.
>
> Mark
> On Sun, 6 Sep 2015 15:25 David van der Spoel <spoel at xray.bmc.uu.se> wrote:
>
>> On 02/09/15 23:12, Mark Abraham wrote:
>> > With 5.1 off the table, we're implementing some much-needed updates to
>> > the way we handle Jenkins verification of GROMACS.
>> >
>> > Teemu's rewritten the scripts we use to implement the various kinds of
>> > verification jobs, which will let us maintain and extend in much less
>> > ad-hoc fashion. Some parts of those scripts will now live in the GROMACS
>> > source repository, so that they can change in step with code changes.
>> > We've already submitted that script to master, so when you rebase
>> > patches over 0ce920a017, Jenkins will be able to use its new toys.
>>
>> So is this the reason that most patches fail right now?
>> Where is this patch in gerrit? I can not seem to find it...
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20150906/7ea8d560/attachment-0001.html>


More information about the gromacs.org_gmx-developers mailing list