Re: [rdo-list] Building overcloud images with TripleO and RDO

Friday, 12 August 2016

Hello,

Thanks for this Graeme. As the original reporter of this bug, it would
be good to understand how the CI process works for stable rdo release.

Since downgrading the galera version we have progressed to a point
where we now see keystone authentication failures in Step 6 due the use
of multidomain support.

So I suppose my question would be where the responsibility for
maintaining stable RDO lies, where to file bugs and who to contact and
what CI takes place currently with these packages and how it can be
improved.

...
From my point of view, as a contributor rarely, deployer of OpenStack
mostly (and an operator intermittently), it would be good to have
images that are grabbed via yum, similar to the OSP deployment
mechanism. These are then the "known good" images and bugs can be filed
against these quickly and we reduce the huge variation in images people
use to deploy.

This would in turn feed back up to the creation of OSP images so I
imagine benefit commercial deployments as well.

It would then be good to be able to build images if we need to pull in
additional packages or customise (I agree with Graeme's point about
building these being entirely in keeping with Open Source tradition)
but currently we have a situation where there are multiple paths to
build an image and multiple images to download from if something is
broken.

The last time there was a thread on stable deployment, a lot of things
got fixed (like buggy introspection so many thanks for that as it now
works flawlessly) but it would be good to understand who to talk to
when things go wrong.

Currently I'm entirely unsure if its tripleo devs, RDO people, Red Hat
bugzilla, CentOS bugs, CentOS cloud sig.. etc.

If anyone could point me in the right direction I'd be grateful. Thanks
to Graeme for debugging the Galera version issue.

On Fri, 2016-08-12 at 04:36 +0100, Graeme Gillies wrote:
...
 Hi,

 I spent the last day or two trying to get to the bottom of the issue
 described at [1], which turned out to be because the version of
 galera
 that is in EPEL is higher than what we have in RDO mitaka stable, and
 when it attempts to get used, mariadb-galera-server fails to start.

 In order to understand why epel was being pulled in, how to stop it,
 and
 how this seemed to have slipped through CI/testing, I've been trying
 to
 look through and understand the whole state of the image building
 process across TripleO, RDO, and our CI.

 Unfortunately what I've discovered hasn't been great. It looks like
 there is at least 3 different paths being used to build images.
 Apologies if anything below is incorrect, it's incredibly convoluted
 and
 difficult to follow for someone who isn't intimately familiar with it
 all (like myself).

 1) Using "openstack overcloud image build --all", which is I assume
 the
 method end users are supposed to be using, or at least it's the
 method
 documented in the docs. This uses diskimagebuilder under the hood,
 but
 the logic behind it is in python (under python-tripleoclient), with a
 lot of stuff hardcoded in

 2) Using tripleo.sh, which, while it looks like calls "openstack
 overcloud image build", also has some of it's own logic and messes
 with
 things like the ~/.cache/image-create/source-repositories file, which
 I
 believe is how the issue at [1] passed CI in the first place

 3) Using the ansible role ansible-role-tripleo-image-build [2] which
 looks like it also uses diskimagebuilder, but through a slightly
 different approach, by using an ansible library that can take an
 image
 definition via yaml (neat!) and then all diskimagebuilder using
 python-tripleo-common as an intermediary. Which is a different code
 path
 (though the code itself looks similar) to python-tripleoclient

 I feel this issue is hugely important as I believe it is one of the
 biggest barriers to having more people adopt RDO/TripleO. Too often
 people encounter issues with deploys that are hard to nail down
 because
 we have no real understanding exactly how they built the images, nor
 as
 an Operator I don't feel like I have a clear understanding of what I
 get
 when I use different options. The bug at [1] is a classic example of
 something I should never have hit.

 We do have stable images available at [3] (built using method 3)
 however
 there are a number of problems with just using them

 1) I think it's perfectly reasonable for people to want to build
 their
 own images. It's part of the Open Source philosophy, we want things
 to
 be Open and we want to understand how things work, so we can
 customise,
 extend, and troubleshoot ourselves. If your image building process is
 so
 convoluted that you have to say "just use our prebuilt ones", then
 you
 have done something wrong.

 2) The images don't get updated (they current mitaka ones were built
 in
 April)

 3) There is actually nowhere on the RDO website, nor the tripleo
 website, that actually references their location. So as a new user,
 you
 have exactly zero chance of finding these images and using them.

 I'm not sure what the best process is to start improving this, but it
 looks like it's complicated enough and involves enough moving pieces
 that a spec against tripleo might be the way to go? I am thinking the
 goal would be to move towards everyone having one way, one code path,
 for building images with TripleO, that could be utilised by all use
 cases out there.

 My thinking is the method would take image definitions in a yaml
 format
 similar to how ansible-role-tripleo-image-build works, and we can
 just
 ship a bunch of different yaml files for all the different image
 scenarios people might want. e.g.

 /usr/share/tripleo-images/centos-7-x86_64-mitaka-cbs.yaml
 /usr/share/tripleo-images/centos-7-x86_64-mitaka-trunk.yaml
 /usr/share/tripleo-images/centos-7-x86_64-trunk.yaml

 Etc etc. you could then have a symlink called default.yaml which
 points
 to whatever scenario you wish people to use by default, and the
 scenario
 could be overwritten by a command line argument. Basically this is
 exactly how mock [4] works, and has been proven to be a nice, clean,
 easy to use workflow for people to understand. The added bonus is if
 people wanted to do their own images, they could copy one of the
 existing files as a template to start with.

 If people feel this is worthwhile (and I hope they do), I'm
 interested
 in understanding what the next steps would be to get this to happen.

 Regards,

 Graeme

 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1365884
 [2] https://github.com/redhat-openstack/ansible-role-tripleo-image-bu
 ild
 [3]
 http://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/mita
 ka/cbs/
 [4] https://fedoraproject.org/wiki/Mock
 --
 Graeme Gillies
 Principal Systems Administrator
 Openstack Infrastructure
 Red Hat Australia

 _______________________________________________
 rdo-list mailing list
 rdo-list(a)redhat.com
 https://www.redhat.com/mailman/listinfo/rdo-list

 To unsubscribe: rdo-list-unsubscribe(a)redhat.com --
Regards,

Christopher Brown
OpenStack Engineer
OCF plc

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [rdo-list] Building overcloud images with TripleO and RDO