[rdo-list] Updates to RDO slaves and jobs in ci.centos.org

Fri Apr 21 00:40:44 UTC 2017

WeIRDO jobs were tested manually on the rdo-ci-slave01 (promote slave)
on which the jobs would not run successfully yesterday.

Everything now looks good after untangling the update issue from
yesterday and WeIRDO promote jobs have been switched to rdo-cloud.

I'll be monitoring this closely but let me know if you see any problems.

David Moreau Simard
Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]

On Thu, Apr 20, 2017 at 12:26 AM, David Moreau Simard <dms at redhat.com> wrote:
> Hi,
>
> There's been a few updates worth mentioning and explaining to a wider
> audience as far as RDO is concerned on the ci.centos.org environment.
>
> First, please note that all packages on the five RDO slaves have been
> updated to the latest version.
> We had not yet updated to 7.3.
>
> The rdo-ci-slave01 node (the "promotion" slave) ran into some issues
> that took some time to fix, EPEL was enabled and it picked up python
> packages it shouldn't have.
> Things seem to be back in order now but some jobs might have failed in
> a weird way, triggering them again should be fine.
>
> Otherwise, all generic WeIRDO jobs are now running on OpenStack
> virtual machines provided by the RDO Cloud.
> This is provided by using the "rdo-virtualized" slave tags.
> The "rdo-promote-virtualized" tag will be used for the weirdo promote
> jobs once we're sure there's no more issues running them on the
> promotion slave.
>
> These tags are designed to work with WeIRDO jobs only for the time
> being, please contact me if you'd like to run virtualized workloads
> from ci.centos.org.
>
> This amounts to around 35 less jobs per day running on Duffy
> ci.centos.org hardware in total on a typical day (including generic
> weirdo jobs and promote weirdo jobs).
>
> I've re-shuffled the capacity around a bit, considering we've now
> freed significant capacity for bare-metal based TripleO jobs.
> The slave threads are now as follows:
> - rdo-ci-slave01: 12 threads (up from 11), tagged with "rdo-promote"
> and "rdo-promote-virtualized"
> - rdo-ci-cloudslave01: 6 threads (up from 4), tagged with "rdo"
> - rdo-ci-cloudslave02: 6 threads (up from 4), tagged with "rdo"
> - rdo-ci-cloudslave03: 8 threads (up from 4), tagged with "rdo-virtualized"
> - rdo-ci-cloudslave04: 8 threads (down from 15), tagged with "rdo-virtualized"
>
> There is a specific reason why cloudslave03 and cloudslave04 amount to
> 16 threads between the two, it is to match the quota we have been
> given in terms of capacity at RDO cloud.
> The threads will be used to artificially limit the amount of jobs run
> against the cloud concurrently without needing to implement queueing
> on our end.
>
> You'll otherwise notice the net effect for the "rdo" and "rdo-promote"
> tag isn't much, at least for the time being, it's very much the same
> since I've re-allocated cloudslave03 to load balance virtualized jobs.
> However, jobs are likely to be more reliable and faster now that they
> won't have to retry for nodes because we're less likely to hit
> rate-limiting.
>
> I'll monitor the situation over the next few days and bump the numbers
> if everything is looking good.
> That said, I'd like to hear about your feedback if you feel things are
> looking better and if we are running into "out of inventory" errors
> less often.
>
> Let me know if you have any questions,
>
> David Moreau Simard
> Senior Software Engineer | Openstack RDO
>
> dmsimard = [irc, github, twitter]