[rdo-dev] [infra][outage] Nodepool outage on review.rdoproject.org, December 2
Alfredo Moralejo Alonso
amoralej at redhat.com
Sat Dec 2 12:57:08 UTC 2017
On Sat, Dec 2, 2017 at 11:56 AM, Javier Pena <jpena at redhat.com> wrote:
> Hi all,
> We had another nodepool outage this morning. Around 9:00 UTC, amoralej
> noticed that no new jobs were being processed. He restarted nodepool, and I
> helped him later with some stale node cleanup. Nodepool started creating
> VMs successfully around 10:00 UTC.
> On a first look at the logs, we see no new messages after 7:30 (not even
> DEBUG logs), but I was unable to run more troubleshooting steps because the
> service was already restarted.
In case it helps, i could run successfully both "nodepool list" and
"nodepool delete <id> --now" (for a couple of instances in delete status)
before restarting nodepool. However nothing appeared in logs and no
instances were created for jobs in queue so i restarted nodepool-launcher
(my understanding was that it fixed similar situations in the past) before
Javier started working on it.
> We will go through the logs on Monday to investigate what happened during
> the outage.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dev