[gmx-developers] impending changes to Jenkins verification

Mark Abraham mark.j.abraham at gmail.com
Wed Mar 9 15:44:11 CET 2016


Hi,

Long-belated update:

For pre-submit CI testing, our Jenkins machinery is now using the
configuration matrix you can find in the source repository
at admin/builds/pre-submit-matrix.txt. For technical reasons, the job that
is triggered is
http://jenkins.gromacs.org/job/Gromacs_Gerrit_master-matrix-from-repo/,
which manages the actual matrix build
http://jenkins.gromacs.org/job/Gromacs_Gerrit_master_nrwpo/. When builds
fail, Teemu arranged some magic so that the link to the matrix job is what
appears on Jenkins, as you are used to. Things seem to be working smoothly
so far, but yell if you see issues.

That testing matrix is approximately the same coverage as we already had
with http://jenkins.gromacs.org/job/Gromacs_Gerrit_master-new-releng/. The
plan is to be fairly reluctant to add new configurations to it (e.g. we
might add an OpenCL build, or new sanitizers), but to have a wider range of
testing triggered after we accept a patch in Gerrit. Details TBD,
discussion at http://redmine.gromacs.org/issues/1815.

Mark

On Wed, Sep 16, 2015 at 3:06 PM Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> Hi,
>
> Now that the new releng machinery seems useful and stable, I've
> deactivated the automatic Jenkins trigger for most of the old master-branch
> verification job types. The old coverage build is still active, because
> we're still working on the new coverage build, but that's a minor issue.
>
> Various other major improvements are still underway (redmine 1815 and
> other places), but not ready for live testing.
>
> Mark
>
> On Sun, Sep 6, 2015 at 9:59 PM Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
>> Hi,
>>
>> Also,  bs_nix1204 is misbehaving. Stefan hopes to try swapping around
>> some GPUs in case that is the issue.
>>
>> Marj
>>
>> On Sun, 6 Sep 2015 19:46 Teemu Murtola <teemu.murtola at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> The situation isn't as simple as that:
>>>
>>>    - There certainly are changes that fail because they have not been
>>>    rebased, but those are easy to identify from the error messages that get
>>>    posted back to Gerrit and say that the required build scripts cannot be
>>>    found.
>>>    - There also are a few changes that fail on purpose, since they are
>>>    waiting for the C++11 change to move forward, and require some stuff from
>>>    there (and the new matrix anyways would also fail with the C++11 change).
>>>    - Additionally, many changes that have been rebased also have one or
>>>    a few failing builds. Persistently retriggering the failing builds leads to
>>>    the builds eventually succeeding, but this is not a sustainable situation.
>>>    The situation isn't exactly new (we've had random failures earlier as well,
>>>    with similar symptoms), but the volume of these is now much higher. This
>>>    really looks like that either Jenkins itself (the software, or the
>>>    hardware/virtualization layer it runs on) is flaky, or that our
>>>    configuration creates way too much peak load somewhere, causing something
>>>    to time out or otherwise fail (since typically the error messages are of
>>>    the sort "Channel is already closed").
>>>    - On top of that, there likely is at least one deadlock hiding
>>>    somewhere in mdrun, since occasionally the mdrun integration tests and/or
>>>    regression tests may hang for 15 minutes (before Jenkins kills them). It's
>>>    just a hunch that this deadlock also triggers more easily when there is
>>>    more load on the system.
>>>
>>> The new (temporary) setup with double jobs on nearly everything puts
>>> more load on Jenkins, so it might just have tipped the balance beyond a
>>> point where things were working ~OK (at least, when you didn't upload too
>>> many changes at the same time). But it would be nice to iron out these
>>> issues now, instead of just ignoring it in hopes that it goes back to
>>> manageable levels when we reduce the load.
>>>
>>> Just my two cents,
>>> Teemu
>>>
>>> On Sun, Sep 6, 2015 at 8:20 PM Mark Abraham <mark.j.abraham at gmail.com>
>>> wrote:
>>>
>>>> It's submitted to master already, so just rebase to HEAD as/when you
>>>> want.
>>>>
>>>> Mark
>>>>
>>> On Sun, 6 Sep 2015 15:25 David van der Spoel <spoel at xray.bmc.uu.se>
>>>> wrote:
>>>>
>>> On 02/09/15 23:12, Mark Abraham wrote:
>>>>>
>>>> > With 5.1 off the table, we're implementing some much-needed updates to
>>>>> > the way we handle Jenkins verification of GROMACS.
>>>>> >
>>>>> > Teemu's rewritten the scripts we use to implement the various kinds
>>>>> of
>>>>> > verification jobs, which will let us maintain and extend in much less
>>>>> > ad-hoc fashion. Some parts of those scripts will now live in the
>>>>> GROMACS
>>>>> > source repository, so that they can change in step with code changes.
>>>>> > We've already submitted that script to master, so when you rebase
>>>>> > patches over 0ce920a017, Jenkins will be able to use its new toys.
>>>>>
>>>>> So is this the reason that most patches fail right now?
>>>>> Where is this patch in gerrit? I can not seem to find it...
>>>>>
>>>> --
>>> Gromacs Developers mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>> or send a mail to gmx-developers-request at gromacs.org.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20160309/9b3943a4/attachment.html>


More information about the gromacs.org_gmx-developers mailing list