[rdo-list] Updates to RDO slaves and jobs in ci.centos.org

Fri Apr 21 12:08:03 UTC 2017

Yeah, i was only taking into account time running run_test.sh, which
shouldn't be impacted by slowness in rdo-ci-slave1. This are my
findings for job weirdo-master-promote-puppet-openstack-scenario002:

RDO-Cloud: 39mins
n30.dusty: 33mins
n13.pufty: 60mins
n54.cursty: 58mins

I think it's pretty good.

On Fri, Apr 21, 2017 at 1:55 PM, David Moreau Simard <dms at redhat.com> wrote:
> The performance is not great because of "rdo-ci-slave01" from which Ansible
> runs on.
>
> We all know that node has performance problems (especially i/o).
> For example, a promote job [1] will take 1 hour and 4 minutes while the
> equivalent generic job [2] (ran on a cloudslave) will finish in about 35
> minutes.
>
> I mean, it takes rdo-ci-slave01 more than five (5!) minutes to just
> bootstrap the job (clone weirdo, virtualenv with ara, ansible, shade and
> initialize ara).
> The same thing takes less than 30 seconds on a cloudslave.
>
> [1]:
> https://ci.centos.org/job/weirdo-master-promote-packstack-scenario001/1080/
> [2]:
> https://ci.centos.org/view/rdo/view/weirdo/job/weirdo-generic-packstack-scenario001/515/
>
> David Moreau Simard
> Senior Software Engineer | Openstack RDO
>
> dmsimard = [irc, github, twitter]
>
> On Apr 21, 2017 4:22 AM, "Alfredo Moralejo Alonso" <amoralej at redhat.com>
> wrote:
>>
>> On Fri, Apr 21, 2017 at 2:40 AM, David Moreau Simard <dms at redhat.com>
>> wrote:
>> > WeIRDO jobs were tested manually on the rdo-ci-slave01 (promote slave)
>> > on which the jobs would not run successfully yesterday.
>> >
>> > Everything now looks good after untangling the update issue from
>> > yesterday and WeIRDO promote jobs have been switched to rdo-cloud.
>> >
>>
>> Nice!, I've seen weirdo jobs in
>>
>> https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo_trunk-promote-master-current-tripleo/44/
>> ran in RDO Cloud with pretty good performance, they seems to run
>> slower than jobs running in dusty servers in ci.centos but faster that
>> the rest of servers.
>>
>> I'll keep an eye on it too to find out if there is any abnormal behavior.
>>
>>
>> > I'll be monitoring this closely but let me know if you see any problems.
>> >
>> > David Moreau Simard
>> > Senior Software Engineer | Openstack RDO
>> >
>> > dmsimard = [irc, github, twitter]
>> >
>> >
>> > On Thu, Apr 20, 2017 at 12:26 AM, David Moreau Simard <dms at redhat.com>
>> > wrote:
>> >> Hi,
>> >>
>> >> There's been a few updates worth mentioning and explaining to a wider
>> >> audience as far as RDO is concerned on the ci.centos.org environment.
>> >>
>> >> First, please note that all packages on the five RDO slaves have been
>> >> updated to the latest version.
>> >> We had not yet updated to 7.3.
>> >>
>> >> The rdo-ci-slave01 node (the "promotion" slave) ran into some issues
>> >> that took some time to fix, EPEL was enabled and it picked up python
>> >> packages it shouldn't have.
>> >> Things seem to be back in order now but some jobs might have failed in
>> >> a weird way, triggering them again should be fine.
>> >>
>> >> Otherwise, all generic WeIRDO jobs are now running on OpenStack
>> >> virtual machines provided by the RDO Cloud.
>> >> This is provided by using the "rdo-virtualized" slave tags.
>> >> The "rdo-promote-virtualized" tag will be used for the weirdo promote
>> >> jobs once we're sure there's no more issues running them on the
>> >> promotion slave.
>> >>
>> >> These tags are designed to work with WeIRDO jobs only for the time
>> >> being, please contact me if you'd like to run virtualized workloads
>> >> from ci.centos.org.
>> >>
>> >> This amounts to around 35 less jobs per day running on Duffy
>> >> ci.centos.org hardware in total on a typical day (including generic
>> >> weirdo jobs and promote weirdo jobs).
>> >>
>> >> I've re-shuffled the capacity around a bit, considering we've now
>> >> freed significant capacity for bare-metal based TripleO jobs.
>> >> The slave threads are now as follows:
>> >> - rdo-ci-slave01: 12 threads (up from 11), tagged with "rdo-promote"
>> >> and "rdo-promote-virtualized"
>> >> - rdo-ci-cloudslave01: 6 threads (up from 4), tagged with "rdo"
>> >> - rdo-ci-cloudslave02: 6 threads (up from 4), tagged with "rdo"
>> >> - rdo-ci-cloudslave03: 8 threads (up from 4), tagged with
>> >> "rdo-virtualized"
>> >> - rdo-ci-cloudslave04: 8 threads (down from 15), tagged with
>> >> "rdo-virtualized"
>> >>
>> >> There is a specific reason why cloudslave03 and cloudslave04 amount to
>> >> 16 threads between the two, it is to match the quota we have been
>> >> given in terms of capacity at RDO cloud.
>> >> The threads will be used to artificially limit the amount of jobs run
>> >> against the cloud concurrently without needing to implement queueing
>> >> on our end.
>> >>
>> >> You'll otherwise notice the net effect for the "rdo" and "rdo-promote"
>> >> tag isn't much, at least for the time being, it's very much the same
>> >> since I've re-allocated cloudslave03 to load balance virtualized jobs.
>> >> However, jobs are likely to be more reliable and faster now that they
>> >> won't have to retry for nodes because we're less likely to hit
>> >> rate-limiting.
>> >>
>> >> I'll monitor the situation over the next few days and bump the numbers
>> >> if everything is looking good.
>> >> That said, I'd like to hear about your feedback if you feel things are
>> >> looking better and if we are running into "out of inventory" errors
>> >> less often.
>> >>
>> >> Let me know if you have any questions,
>> >>
>> >> David Moreau Simard
>> >> Senior Software Engineer | Openstack RDO
>> >>
>> >> dmsimard = [irc, github, twitter]