On Thu, Apr 20, 2017 at 12:26 AM, David Moreau Simard <dms@redhat.com> wrote:

Hi,

There's been a few updates worth mentioning and explaining to a wider
audience as far as RDO is concerned on the ci.centos.org environment.

First, please note that all packages on the five RDO slaves have been
updated to the latest version.
We had not yet updated to 7.3.

The rdo-ci-slave01 node (the "promotion" slave) ran into some issues
that took some time to fix, EPEL was enabled and it picked up python
packages it shouldn't have.
Things seem to be back in order now but some jobs might have failed in
a weird way, triggering them again should be fine.

Otherwise, all generic WeIRDO jobs are now running on OpenStack
virtual machines provided by the RDO Cloud.
This is provided by using the "rdo-virtualized" slave tags.
The "rdo-promote-virtualized" tag will be used for the weirdo promote
jobs once we're sure there's no more issues running them on the
promotion slave.

These tags are designed to work with WeIRDO jobs only for the time
being, please contact me if you'd like to run virtualized workloads
from ci.centos.org.

This amounts to around 35 less jobs per day running on Duffy
ci.centos.org hardware in total on a typical day (including generic
weirdo jobs and promote weirdo jobs).

I've re-shuffled the capacity around a bit, considering we've now
freed significant capacity for bare-metal based TripleO jobs.
The slave threads are now as follows:
- rdo-ci-slave01: 12 threads (up from 11), tagged with "rdo-promote"
and "rdo-promote-virtualized"
- rdo-ci-cloudslave01: 6 threads (up from 4), tagged with "rdo"
- rdo-ci-cloudslave02: 6 threads (up from 4), tagged with "rdo"
- rdo-ci-cloudslave03: 8 threads (up from 4), tagged with "rdo-virtualized"
- rdo-ci-cloudslave04: 8 threads (down from 15), tagged with "rdo-virtualized"

There is a specific reason why cloudslave03 and cloudslave04 amount to
16 threads between the two, it is to match the quota we have been
given in terms of capacity at RDO cloud.
The threads will be used to artificially limit the amount of jobs run
against the cloud concurrently without needing to implement queueing
on our end.

You'll otherwise notice the net effect for the "rdo" and "rdo-promote"
tag isn't much, at least for the time being, it's very much the same
since I've re-allocated cloudslave03 to load balance virtualized jobs.
However, jobs are likely to be more reliable and faster now that they
won't have to retry for nodes because we're less likely to hit
rate-limiting.

I'll monitor the situation over the next few days and bump the numbers
if everything is looking good.
That said, I'd like to hear about your feedback if you feel things are
looking better and if we are running into "out of inventory" errors
less often.

Let me know if you have any questions,

David Moreau Simard
Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]