[gmx-developers] impending changes to Jenkins verification

Mark Abraham mark.j.abraham at gmail.com
Thu Mar 10 17:48:50 CET 2016


Hi,

I'd rather we made some conscious decisions in the context of the actual
current matrices, so that the history is documented and any discussion can
be looked up e.g. on Gerrit, or in comments in the matrix files, or in
releng documentation (when I find time to update Sphinx like Teemu's patch
probably needs).

For 5.1, I've been adding whatever configs it takes to the matrix to have
reasonable coverage while minimizing my effort. Any extra machine load will
eventually not be a problem, because the that branch will be tested less
often as time passes, and the infrastructure has now all changed for master
branch. That's crude, but otherwise maintaining GROMACS infrastructure
would be its own fulltime job. :-)

Mark

On Thu, Mar 10, 2016 at 5:03 PM Szilárd Páll <pall.szilard at gmail.com> wrote:

> On Wed, Mar 9, 2016 at 3:43 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
>> Hi,
>>
>> Long-belated update:
>>
>> For pre-submit CI testing, our Jenkins machinery is now using the
>> configuration matrix you can find in the source repository
>> at admin/builds/pre-submit-matrix.txt. For technical reasons, the job that
>> is triggered is
>> http://jenkins.gromacs.org/job/Gromacs_Gerrit_master-matrix-from-repo/,
>> which manages the actual matrix build
>> http://jenkins.gromacs.org/job/Gromacs_Gerrit_master_nrwpo/. When builds
>> fail, Teemu arranged some magic so that the link to the matrix job is what
>> appears on Jenkins, as you are used to. Things seem to be working smoothly
>> so far, but yell if you see issues.
>>
>> That testing matrix is approximately the same coverage as we already had
>> with http://jenkins.gromacs.org/job/Gromacs_Gerrit_master-new-releng/.
>>
>
> I suggest taking into consideration /job/Gromacs_Gerrit_5_1 as a
> reference? job/Gromacs_Gerrit_master-new-releng has never had a good
> coverage.
>
> Of course I'm not suggesting re-creating the exact same configs, just to
> make sure relevant configs that may have been added to 5.1 for a good
> reason are not missed (e.g. SSE4.1?). With new requirements and less
> interest in ancient compilers like gcc 4.4 and Intel 12.1, many rows from
> the 5.1 matrix are obviously not relevant.
>
> --
> Szilárd
>
>
>> The plan is to be fairly reluctant to add new configurations to it (e.g.
>> we might add an OpenCL build, or new sanitizers), but to have a wider range
>> of testing triggered after we accept a patch in Gerrit. Details TBD,
>> discussion at http://redmine.gromacs.org/issues/1815.
>>
>> Mark
>>
>> On Wed, Sep 16, 2015 at 3:06 PM Mark Abraham <mark.j.abraham at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Now that the new releng machinery seems useful and stable, I've
>>> deactivated the automatic Jenkins trigger for most of the old master-branch
>>> verification job types. The old coverage build is still active, because
>>> we're still working on the new coverage build, but that's a minor issue.
>>>
>>> Various other major improvements are still underway (redmine 1815 and
>>> other places), but not ready for live testing.
>>>
>>> Mark
>>>
>>> On Sun, Sep 6, 2015 at 9:59 PM Mark Abraham <mark.j.abraham at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Also,  bs_nix1204 is misbehaving. Stefan hopes to try swapping around
>>>> some GPUs in case that is the issue.
>>>>
>>>> Marj
>>>>
>>>> On Sun, 6 Sep 2015 19:46 Teemu Murtola <teemu.murtola at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The situation isn't as simple as that:
>>>>>
>>>>>    - There certainly are changes that fail because they have not been
>>>>>    rebased, but those are easy to identify from the error messages that get
>>>>>    posted back to Gerrit and say that the required build scripts cannot be
>>>>>    found.
>>>>>    - There also are a few changes that fail on purpose, since they
>>>>>    are waiting for the C++11 change to move forward, and require some stuff
>>>>>    from there (and the new matrix anyways would also fail with the C++11
>>>>>    change).
>>>>>    - Additionally, many changes that have been rebased also have one
>>>>>    or a few failing builds. Persistently retriggering the failing builds leads
>>>>>    to the builds eventually succeeding, but this is not a sustainable
>>>>>    situation. The situation isn't exactly new (we've had random failures
>>>>>    earlier as well, with similar symptoms), but the volume of these is now
>>>>>    much higher. This really looks like that either Jenkins itself (the
>>>>>    software, or the hardware/virtualization layer it runs on) is flaky, or
>>>>>    that our configuration creates way too much peak load somewhere, causing
>>>>>    something to time out or otherwise fail (since typically the error messages
>>>>>    are of the sort "Channel is already closed").
>>>>>    - On top of that, there likely is at least one deadlock hiding
>>>>>    somewhere in mdrun, since occasionally the mdrun integration tests and/or
>>>>>    regression tests may hang for 15 minutes (before Jenkins kills them). It's
>>>>>    just a hunch that this deadlock also triggers more easily when there is
>>>>>    more load on the system.
>>>>>
>>>>> The new (temporary) setup with double jobs on nearly everything puts
>>>>> more load on Jenkins, so it might just have tipped the balance beyond a
>>>>> point where things were working ~OK (at least, when you didn't upload too
>>>>> many changes at the same time). But it would be nice to iron out these
>>>>> issues now, instead of just ignoring it in hopes that it goes back to
>>>>> manageable levels when we reduce the load.
>>>>>
>>>>> Just my two cents,
>>>>> Teemu
>>>>>
>>>>> On Sun, Sep 6, 2015 at 8:20 PM Mark Abraham <mark.j.abraham at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> It's submitted to master already, so just rebase to HEAD as/when you
>>>>>> want.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>> On Sun, 6 Sep 2015 15:25 David van der Spoel <spoel at xray.bmc.uu.se>
>>>>>> wrote:
>>>>>>
>>>>> On 02/09/15 23:12, Mark Abraham wrote:
>>>>>>>
>>>>>> > With 5.1 off the table, we're implementing some much-needed updates
>>>>>>> to
>>>>>>> > the way we handle Jenkins verification of GROMACS.
>>>>>>> >
>>>>>>> > Teemu's rewritten the scripts we use to implement the various
>>>>>>> kinds of
>>>>>>> > verification jobs, which will let us maintain and extend in much
>>>>>>> less
>>>>>>> > ad-hoc fashion. Some parts of those scripts will now live in the
>>>>>>> GROMACS
>>>>>>> > source repository, so that they can change in step with code
>>>>>>> changes.
>>>>>>> > We've already submitted that script to master, so when you rebase
>>>>>>> > patches over 0ce920a017, Jenkins will be able to use its new toys.
>>>>>>>
>>>>>>> So is this the reason that most patches fail right now?
>>>>>>> Where is this patch in gerrit? I can not seem to find it...
>>>>>>>
>>>>>> --
>>>>> Gromacs Developers mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>>> before posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>>> or send a mail to gmx-developers-request at gromacs.org.
>>>>
>>>>
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org.
>>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20160310/e3943ca2/attachment.html>


More information about the gromacs.org_gmx-developers mailing list