[outage][infra] RDO Cloud upgrade
by Javier Pena
Hi,
The RDO Cloud is now going through its major upgrade. This may also impact our CI jobs and most of our infrastructure services, please stay tuned for updates.
As a preventive measure, we have stopped builds on the RDO Trunk server (DLRN). We will work with the RDO Cloud to ensure services are available as soon as possible.
Regards,
Javier
6 years, 11 months
[Minutes] RDO Community meeting minutes : 2017-12-13
by Chandan kumar
#rdo: RDO meeting - 2017-12-13
==============================
Meeting started by chandankumar at 15:01:25 UTC. The full logs are
available at
http://eavesdrop.openstack.org/meetings/rdo_meeting___2017_12_13/2017/rdo...
.
Meeting summary
---------------
* Roll Call (chandankumar, 15:01:41)
* Dry-run/test/firedrill of the queens release ? (in retrospect of the
challenges from pike) (chandankumar, 15:05:11)
* ACTION: dmsimard to create a ML thread about testing the process of
shipping a new stable release (dmsimard, 15:21:56)
* Test day TOMORROW (chandankumar, 15:22:51)
* LINK: https://etherpad.openstack.org/p/rdo-queens-m2-cloud
(chandankumar, 15:25:43)
* ICYMI: http://rdoproject.org/newsletter/2017/december/ (Thanks to
mary_grace) (chandankumar, 15:30:26)
* Review "rebase if necessary" strategy in gerrit (e.g rdoinfo)
(chandankumar, 15:32:47)
* LINK:
https://docs.openstack.org/infra/zuul/feature/zuulv3/user/config.html#val...
(pabelanger, 15:44:09)
* chair for next meeting (chandankumar, 15:56:04)
* ACTION: amoralej to chair for next meeting (chandankumar, 15:57:08)
* 27th Dec, 2017 RDO community meeting is cancelled due to shutdown
:-) (chandankumar, 15:57:53)
* openfloor (chandankumar, 15:58:01)
Meeting ended at 15:59:45 UTC.
Action items, by person
-----------------------
* amoralej
* amoralej to chair for next meeting
* dmsimard
* dmsimard to create a ML thread about testing the process of shipping
a new stable release
People present (lines said)
---------------------------
* amoralej (79)
* dmsimard (63)
* chandankumar (43)
* rbowen (24)
* pabelanger (17)
* EmilienM (11)
* openstack (9)
* number80 (6)
* rdogerrit (6)
* jpena (2)
* PagliaccisCloud (1)
Generated by `MeetBot`_ 0.1.4
6 years, 11 months
[outage][infra] RDO Cloud upgrade
by Javier Pena
Hi all,
The RDO Cloud is being upgraded. As a result, we can expect outages in multiple RDO Infra services to happen until the upgrade is finished. For example, currently nodepool is failing to create new VMs, so CI jobs are being delayed.
We are in contact with the RDO Cloud team, and will work with them to make sure all services are up and running after the upgrade.
Regards,
Javier
6 years, 11 months
[Minutes] RDO Office Hour : 2017-12-12
by Chandan kumar
==================================
#rdo: RDO Office Hour - 2017-12-12
==================================
Meeting started by chandankumar at 13:31:42 UTC. The full logs are
available at
http://eavesdrop.openstack.org/meetings/rdo_office_hour___2017_12_12/2017...
.
Meeting summary
---------------
* Roll Call (chandankumar, 13:31:55)
* RDO Package Review Cleanup (chandankumar, 13:34:29)
* LINK: http://bit.ly/rdo_pkg_rvw (chandankumar, 13:35:09)
* LINK: RDO Queens Tracker:
https://bugzilla.redhat.com/show_bug.cgi?id=1486366 (chandankumar,
13:36:23)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1523058 -> closed
(chandankumar, 13:55:13)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1484914 -> closed
(chandankumar, 13:56:05)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1481992 -> closed
(jpena, 13:57:22)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1499144 closing
this one (chandankumar, 13:57:49)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1420931 -> closed
(jpena, 13:58:28)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1499144 -> closed
(chandankumar, 14:00:46)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1406146 -> closed
(jpena, 14:02:40)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1488615 -> closed
(jpena, 14:03:45)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1506884 -> closed
(ykarel, 14:05:01)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1462740 -> closed
(chandankumar, 14:05:03)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1509269 -> closed
(chandankumar, 14:24:00)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1506241 -> closed
(chandankumar, 14:32:46)
Meeting ended at 14:35:16 UTC.
People present (lines said)
---------------------------
* chandankumar (41)
* openstack (29)
* jpena (13)
* weshay (11)
* amoralej (11)
* mrunge (5)
* ykarel (3)
* number80 (3)
* rdogerrit (2)
* jrist (2)
* sdoran (1)
* misc (1)
* dmsimard (1)
* EmilienM (1)
* mhu (1)
Generated by `MeetBot`_ 0.1.4
Thanks,
Chandan Kumar
6 years, 11 months
[outage] New package builds are tempoarily paused on trunk.rdoproject.org
by David Moreau Simard
Hi,
There is an ongoing maintenance on the primary cloud provider for RDO's
infrastructure and while the mirrors on trunk.rdoproject.org are *not*
affected by this maintenance, the node responsible for new packages is.
The intermittent network instability can lead package builds or the mirror
synchronization to fail which can lead to other problems such as
erroneously missing packages [1].
For the time being, we've resolved the problems caused by this instability
and have paused new builds so that we do not generate any further
inconsistency.
If jobs have failed due to HTTP 403 (Forbidden) or HTTP 404 (Not found)
errors on packages provided by the RDO trunk repositories, please try again.
We will stay up to date on maintenance and re-enable package builds once we
are confident in the status of the network.
Thanks.
[1]: https://bugs.launchpad.net/tripleo/+bug/1737611
David Moreau Simard
Senior Software Engineer | OpenStack RDO
dmsimard = [irc, github, twitter]
6 years, 11 months
Re: [rdo-dev] [rhos-dev] [infra][outage] Nodepool outage on review.rdoproject.org, December 2
by Tristan Cacqueray
On December 3, 2017 9:27 pm, Paul Belanger wrote:
[snip]
> Please reach out to me the next time you restart it, something is seriously
> wrong is we have to keep restarting nodepool every few days.
> At this rate, I would even leave nodepool-launcher is the bad state until we inspect it.
>
> Thanks,
> PB
>
Hello,
nodepoold was stuck again. Before restarting it I dumped the thread's stack-trace and
it seems like 8 threads were trying to aquire a single lock (futex=0xe41de0):
https://review.rdoproject.org/paste/show/9VnzowfzBogKG4Gw0Kes/
This make the main loop stuck at
http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/node...
I'm not entirely sure what caused this deadlock, the other threads involved
are quite complex:
* kazoo zk_loop
* zmq received
* apscheduler mainloop
* periodicCheck paramiko client connect
* paramiko transport run
* nodepool webapp handle request
Next time, before restarting the process, it would be good to know what
thread is actually holding the lock, using (gdb) py-print, as explained
here:
https://stackoverflow.com/questions/42169768/debug-pythread-acquire-lock-...
Paul: any other debug instructions would be appreciated.
Regards,
-Tristan
6 years, 11 months