On Mon, Sep 12, 2016 at 5:54 PM, Wesley Hayutin <whayutin(a)redhat.com> wrote:
On Mon, Sep 12, 2016 at 5:52 PM, David Moreau Simard <dms(a)redhat.com>
wrote:
> Can we keep the promotion jobs enabled and just disable the jobs that
> actually upload the images ?
>
> We need visibility on ongoing issues, if there are any, and the jobs
> have already been disabled since test day last week.
>
Ya.. that is a better idea.. will adjust them.
Thanks
>
> David Moreau Simard
> Senior Software Engineer | Openstack RDO
>
> dmsimard = [irc, github, twitter]
>
>
> On Mon, Sep 12, 2016 at 5:37 PM, Wesley Hayutin <whayutin(a)redhat.com>
> wrote:
> > Greetings,
> >
> > I have disabled all the RDO promotion jobs until such time we have
> confirmed
> > that images are published directly from the virthost to the ci.centos
> > artifacts server.
> >
> > This work is being lead by Matt Young and tested and refined by John
> > Trowbridge and myself. If there are any requirements for promotion while
> > this work is done we will utilize the internal pipeline.
> >
> > Thank you
> >
> > _______________________________________________
> > rdo-list mailing list
> > rdo-list(a)redhat.com
> >
https://www.redhat.com/mailman/listinfo/rdo-list
> >
> > To unsubscribe: rdo-list-unsubscribe(a)redhat.com
>
Greetings,
TLDR:
There are unique properties to the infra at ci.centos that make it
difficult to work with qcow2 images. Transferring images across the
ci.centos infra can cause instability in the ci.centos infrastructure
itself. The nature of this process is slow and difficult to test however
we now believe we have resolved the issue.
Details:
I wanted to send an update to the community regarding the promotion status
of RDO via CI.
The issue at hand was syncing and promoting tripleo undercloud and
overcloud images to the ci.centos artifacts server.
Originally these images were synced in two steps, first synced to the
jenkins slave and then to the artifacts server.
Syncing the images to the slave was causing instability in the ci.centos
infra and also causing network and filesystem issues on the ci.centos slave.
Quite simply, image syncs were disrupting the infrastructure and causing
failures throughout ci.centos.
Through a series of patches [1-8] we have streamlined the image creation
and promotion process to only sync the image once. The artifact server
*only* has
rsync available, there is no ssh service available. It was a complicated
problem to solve, but we think we have the code required merged and tests
are running.
Apologies for the outtage however, we had to choose between an outage and
bringing down ci.centos infra. We chose to take an outtage immediately
after newton milestone 3 was promoted and the internal beta was imported.
Thank you and we appreciate your patience.
[1]
https://review.gerrithub.io/#/c/290337/
[2]
https://review.gerrithub.io/#/c/290344/
[3]
https://review.gerrithub.io/#/c/290432/
[4]
https://review.gerrithub.io/#/c/290433/
[5]
https://review.gerrithub.io/#/c/294590/
[6]
https://review.gerrithub.io/#/c/294663/
[7]
https://review.gerrithub.io/#/c/294672/
[8]
https://review.gerrithub.io/#/c/294694/