[gmx-developers] Please cancel outdated CI pipelines

Erik Lindahl erik.lindahl at gmail.com
Mon Sep 28 17:29:19 CEST 2020


Hi,

Couldn't help to continue looking, so one more observation ;-)

We need to be much more careful about adding jobs requesting GPUs. There
are presently three different jobs requesting a single NVIDIA device, and
two different jobs requesting two NVIDIA devices, and each of them need ~5
minutes just for the tests.

We only have two nodes with 2 devices each (both AMD and NVIDIA). If each
of these two nodes have a job running requesting a single GPU, no 2-GPU
jobs can run. Similarly, the two dual-NVIDIA-GPU jobs blocks the entire
available CI infrastructure for 5+ minutes for every single change (meaning
our throughput is limited to ~10 changes per hour).

It's definitely convenient to be able to run CI tests on two devices, but
that's the type of job that should finish in less than 30 seconds ;-)

Cheers,

Erik




On Mon, Sep 28, 2020 at 5:15 PM Paul Bauer <paul.bauer.q at gmail.com> wrote:

> Hello,
>
> I started to set up tests that can run as jobs after a commit is merged,
> and we should just see that we get this code in to reduce the stress on the
> hardware. All the slow jobs can then be moved there.
>
> /Paul
>
> On Mon, 28 Sep 2020, 17:00 Erik Lindahl, <erik.lindahl at gmail.com> wrote:
>
>> Hi,
>>
>> Good point, but it also shows we have some homework to do. Our new CI
>> infrastructure was quite expanded (~80 high end cores, 2GB/core, all SSD
>> disks) - but this only seems to have led all of us to happily add tests
>> that took more time.
>>
>> Looking just briefly at the pipelines, it seems the testing phase is our
>> main culprit. While it is of course nice to have per-change tests, I don't
>> think it's sustainable that we need 10+ CPU hours of testing for every typo
>> fix.
>>
>> In particular these tests need attention:
>>
>> - gmx-api. They both take 12-15 minutes on two cores, and there are four
>> of them.
>>
>> - TSAN & ASAN. I don't think we can justify using 8 cores for 15-20 min
>> for each of them.
>>
>> - OpenCL, likely related to slow kernel compiles, which gets even worse
>> when the AMD GPUs become a bottleneck.
>>
>>
>> I also suspect that quite a few tests asking for lots of cores and memory
>> don't really use all of it (at least not efficiently), but as a result
>> other CI jobs will have to wait.
>>
>> There's also a huge difference in performance between proper unit tests
>> called on code level vs. the ones that issue commands or even run
>> simulations.
>>
>> This week is not the one to change things, but IMHO we need to get back
>> to the original model of the CI tests for every change executing FAST. Any
>> test job that doesn't complete in less than ~3 min on a single core does
>> not belong among the ones that are run for every change.
>>
>> Cheers,
>>
>> Erik
>>
>>
>>
>>
>> Erik Lindahl <erik.lindahl at scilifelab.se>
>> Professor of Biophysics
>> Science for Life Laboratory
>> Stockholm University & KTH
>> Office (SciLifeLab): +46 8 524 81567
>> Cell (Sweden): +46 73 4618050
>> Cell (US): +1 (650) 924 7674
>>
>>
>>
>> > On 28 Sep 2020, at 16:23, Eric Irrgang <ericirrgang at gmail.com> wrote:
>> >
>> > Hi Devs,
>> >
>> > If you push a new commit to a GitLab branch before the pipelines are
>> finished running for the previous commit, please consider canceling one or
>> the other sets of pipelines.
>> >
>> > You can look at the Pipelines tab of the merge request page (or just go
>> to https://gitlab.com/gromacs/gromacs/-/pipelines). If you have pushed a
>> new commit, you are presumably only interested in one (of the sets of)
>> pipelines. Just click the red X to cancel the pipelines you don't need.
>> >
>> > If you are pushing to a branch that doesn't have an MR yet, you are
>> still generating one pipeline for every push, so please use the web
>> interface to cancel the pipelines that aren't useful to you.
>> >
>> > It will really help all of us to get our CI pipelines to run sooner.
>> >
>> > Thanks!
>> > M. Eric Irrgang
>> > --
>> > Gromacs Developers mailing list
>> >
>> > * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org.
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org.
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.



-- 
Erik Lindahl <erik.lindahl at dbb.su.se>
Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
University
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20200928/7361f4a1/attachment.html>


More information about the gromacs.org_gmx-developers mailing list