[gmx-developers] impending changes to Jenkins verification
Teemu Murtola
teemu.murtola at gmail.com
Sun Sep 6 19:46:34 CEST 2015
Hi,
The situation isn't as simple as that:
- There certainly are changes that fail because they have not been
rebased, but those are easy to identify from the error messages that get
posted back to Gerrit and say that the required build scripts cannot be
found.
- There also are a few changes that fail on purpose, since they are
waiting for the C++11 change to move forward, and require some stuff from
there (and the new matrix anyways would also fail with the C++11 change).
- Additionally, many changes that have been rebased also have one or a
few failing builds. Persistently retriggering the failing builds leads to
the builds eventually succeeding, but this is not a sustainable situation.
The situation isn't exactly new (we've had random failures earlier as well,
with similar symptoms), but the volume of these is now much higher. This
really looks like that either Jenkins itself (the software, or the
hardware/virtualization layer it runs on) is flaky, or that our
configuration creates way too much peak load somewhere, causing something
to time out or otherwise fail (since typically the error messages are of
the sort "Channel is already closed").
- On top of that, there likely is at least one deadlock hiding somewhere
in mdrun, since occasionally the mdrun integration tests and/or regression
tests may hang for 15 minutes (before Jenkins kills them). It's just a
hunch that this deadlock also triggers more easily when there is more load
on the system.
The new (temporary) setup with double jobs on nearly everything puts more
load on Jenkins, so it might just have tipped the balance beyond a point
where things were working ~OK (at least, when you didn't upload too many
changes at the same time). But it would be nice to iron out these issues
now, instead of just ignoring it in hopes that it goes back to manageable
levels when we reduce the load.
Just my two cents,
Teemu
On Sun, Sep 6, 2015 at 8:20 PM Mark Abraham <mark.j.abraham at gmail.com>
wrote:
> It's submitted to master already, so just rebase to HEAD as/when you want.
>
> Mark
> On Sun, 6 Sep 2015 15:25 David van der Spoel <spoel at xray.bmc.uu.se> wrote:
>
>> On 02/09/15 23:12, Mark Abraham wrote:
>> > With 5.1 off the table, we're implementing some much-needed updates to
>> > the way we handle Jenkins verification of GROMACS.
>> >
>> > Teemu's rewritten the scripts we use to implement the various kinds of
>> > verification jobs, which will let us maintain and extend in much less
>> > ad-hoc fashion. Some parts of those scripts will now live in the GROMACS
>> > source repository, so that they can change in step with code changes.
>> > We've already submitted that script to master, so when you rebase
>> > patches over 0ce920a017, Jenkins will be able to use its new toys.
>>
>> So is this the reason that most patches fail right now?
>> Where is this patch in gerrit? I can not seem to find it...
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20150906/7ea8d560/attachment-0001.html>
More information about the gromacs.org_gmx-developers
mailing list