Hi all,
So, Ironic task manager is attacking us again:
https://bugzilla.redhat.com/show_bug.cgi?id=1233452
Previously we already had
https://bugzilla.redhat.com/show_bug.cgi?id=1212134
I've implemented retries upstream in ironicclient and backported to our
packages. Later I had to bump the retry timeout in instack-undercloud to
1 minute:
https://github.com/rdo-management/instack-undercloud/blob/master/scripts/...
Now we have the same problem in another place and I wonder how to fix
it. I have 2 obvious idea:
1. Patch ironicclient to have longer default timeout (2 mins?)
2. Update stackrc to carry longer IRONIC_MAX_RETRIES and
IRONIC_RETRY_INTERVAL
I'd prefer the latter, as it does not touch ironicclient package, only
undercloud installation tool. It can also be changed more easily in run
time. WDYT?
I wonder what the root cause is as well. I suspect some very slow BMC's
take too much time to do power actions or power syncs. It's possible
we'll make a wrong guess, and the problem will persist despite becoming
rare.
Thanks,
Dmitry