On Sat, Dec 2, 2017 at 11:56 AM, Javier Pena <jpena(a)redhat.com> wrote:
Hi all,
We had another nodepool outage this morning. Around 9:00 UTC, amoralej
noticed that no new jobs were being processed. He restarted nodepool, and I
helped him later with some stale node cleanup. Nodepool started creating
VMs successfully around 10:00 UTC.
On a first look at the logs, we see no new messages after 7:30 (not even
DEBUG logs), but I was unable to run more troubleshooting steps because the
service was already restarted.
In case it helps, i could run successfully both "nodepool list" and
"nodepool delete <id> --now" (for a couple of instances in delete status)
before restarting nodepool. However nothing appeared in logs and no
instances were created for jobs in queue so i restarted nodepool-launcher
(my understanding was that it fixed similar situations in the past) before
Javier started working on it.
We will go through the logs on Monday to investigate what happened
during
the outage.
Regards,
Javier