[gmx-developers] Please cancel outdated CI pipelines
Erik Lindahl
erik.lindahl at gmail.com
Mon Sep 28 17:43:34 CEST 2020
Actually, scratch that:
The three single-NVIDIA GPU jobs also occupy 1.5 (read: 2) nodes for 5+
minutes, Thus the NVIDIA jobs in _isolation_ limit our total
theoretical CI throughput to 5 changes per hour, not counting that they
also compete with the AMD OpenCL job for those nodes.
This is the reason the logs of those jobs sometimes spend the first 15-20
minutes waiting for pods to become available.
In other words: They have to go ;-)
Cheers,
Erik
On Mon, Sep 28, 2020 at 5:29 PM Erik Lindahl <erik.lindahl at gmail.com> wrote:
> Hi,
>
> Couldn't help to continue looking, so one more observation ;-)
>
> We need to be much more careful about adding jobs requesting GPUs. There
> are presently three different jobs requesting a single NVIDIA device, and
> two different jobs requesting two NVIDIA devices, and each of them need ~5
> minutes just for the tests.
>
> We only have two nodes with 2 devices each (both AMD and NVIDIA). If each
> of these two nodes have a job running requesting a single GPU, no 2-GPU
> jobs can run. Similarly, the two dual-NVIDIA-GPU jobs blocks the entire
> available CI infrastructure for 5+ minutes for every single change (meaning
> our throughput is limited to ~10 changes per hour).
>
> It's definitely convenient to be able to run CI tests on two devices, but
> that's the type of job that should finish in less than 30 seconds ;-)
>
> Cheers,
>
> Erik
>
>
>
>
> On Mon, Sep 28, 2020 at 5:15 PM Paul Bauer <paul.bauer.q at gmail.com> wrote:
>
>> Hello,
>>
>> I started to set up tests that can run as jobs after a commit is merged,
>> and we should just see that we get this code in to reduce the stress on the
>> hardware. All the slow jobs can then be moved there.
>>
>> /Paul
>>
>> On Mon, 28 Sep 2020, 17:00 Erik Lindahl, <erik.lindahl at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Good point, but it also shows we have some homework to do. Our new CI
>>> infrastructure was quite expanded (~80 high end cores, 2GB/core, all SSD
>>> disks) - but this only seems to have led all of us to happily add tests
>>> that took more time.
>>>
>>> Looking just briefly at the pipelines, it seems the testing phase is our
>>> main culprit. While it is of course nice to have per-change tests, I don't
>>> think it's sustainable that we need 10+ CPU hours of testing for every typo
>>> fix.
>>>
>>> In particular these tests need attention:
>>>
>>> - gmx-api. They both take 12-15 minutes on two cores, and there are four
>>> of them.
>>>
>>> - TSAN & ASAN. I don't think we can justify using 8 cores for 15-20 min
>>> for each of them.
>>>
>>> - OpenCL, likely related to slow kernel compiles, which gets even worse
>>> when the AMD GPUs become a bottleneck.
>>>
>>>
>>> I also suspect that quite a few tests asking for lots of cores and
>>> memory don't really use all of it (at least not efficiently), but as a
>>> result other CI jobs will have to wait.
>>>
>>> There's also a huge difference in performance between proper unit tests
>>> called on code level vs. the ones that issue commands or even run
>>> simulations.
>>>
>>> This week is not the one to change things, but IMHO we need to get back
>>> to the original model of the CI tests for every change executing FAST. Any
>>> test job that doesn't complete in less than ~3 min on a single core does
>>> not belong among the ones that are run for every change.
>>>
>>> Cheers,
>>>
>>> Erik
>>>
>>>
>>>
>>>
>>> Erik Lindahl <erik.lindahl at scilifelab.se>
>>> Professor of Biophysics
>>> Science for Life Laboratory
>>> Stockholm University & KTH
>>> Office (SciLifeLab): +46 8 524 81567
>>> Cell (Sweden): +46 73 4618050
>>> Cell (US): +1 (650) 924 7674
>>>
>>>
>>>
>>> > On 28 Sep 2020, at 16:23, Eric Irrgang <ericirrgang at gmail.com> wrote:
>>> >
>>> > Hi Devs,
>>> >
>>> > If you push a new commit to a GitLab branch before the pipelines are
>>> finished running for the previous commit, please consider canceling one or
>>> the other sets of pipelines.
>>> >
>>> > You can look at the Pipelines tab of the merge request page (or just
>>> go to https://gitlab.com/gromacs/gromacs/-/pipelines). If you have
>>> pushed a new commit, you are presumably only interested in one (of the sets
>>> of) pipelines. Just click the red X to cancel the pipelines you don't need.
>>> >
>>> > If you are pushing to a branch that doesn't have an MR yet, you are
>>> still generating one pipeline for every push, so please use the web
>>> interface to cancel the pipelines that aren't useful to you.
>>> >
>>> > It will really help all of us to get our CI pipelines to run sooner.
>>> >
>>> > Thanks!
>>> > M. Eric Irrgang
>>> > --
>>> > Gromacs Developers mailing list
>>> >
>>> > * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>>> posting!
>>> >
>>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >
>>> > * For (un)subscribe requests visit
>>> >
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>> or send a mail to gmx-developers-request at gromacs.org.
>>> --
>>> Gromacs Developers mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>> or send a mail to gmx-developers-request at gromacs.org.
>>
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org.
>
>
>
> --
> Erik Lindahl <erik.lindahl at dbb.su.se>
> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> University
> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>
--
Erik Lindahl <erik.lindahl at dbb.su.se>
Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
University
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20200928/896a9d6a/attachment-0001.html>
More information about the gromacs.org_gmx-developers
mailing list