[gmx-developers] jenkins killed by core files

Szilárd Páll pall.szilard at gmail.com
Thu Nov 9 17:59:27 CET 2017


Hi,

Earlier today (around 15:00 CET) a change resulted in ~1200 failed
tests and generated a volume of core files that swamped the jenkins
server. We spent far too much time tracking down the issue and
recovering from it and identified some critical issues in the setup
that we believe require changes.

Given that:
- we all are increasingly drafting and developing in gerrit, not even
testing locally,
- we don't have enough space for even the ~1200 cores files of a
single failed job could generate (let alone multiple iterations of a
buggy change),
- we're storing uncompressed cores files that we rarely if ever look at
we need to take action and prevent such time-consuming failures.

There are two options I see:
- I'll disable archiving core files (right away so the aforementioned
change won't bomb jenkins again ;)) -- after devs time saved by having
jenkins compile for them can now be spent on occasionally testing a
bit more locally (or on the build slaves if necessary) when hard to
track down bugs cause crashes;

- We have found a jenkis plugin that compresses artifacts; if this
reduces the size of archived data enough, we could try to deploy it
and re-enable core file archival. I have the suspicion that it won't
work due to a bug prevented us from using it to begin with, but I'll
have to check.

Cheers,
--
Szilárd


More information about the gromacs.org_gmx-developers mailing list