I want to follow up this conversation from this review:
https://review.rdoproject.org/r/9846
For reference, we're talking about the way we will do promotions based
on DLRN API using this piece of script I called dlrnapi_promoter[1].
Attila, John, Wes, I'm not convinced we need a dedicated machine
for this.
Can we take some time to discuss if a cron on a machine is the
right approach in the first place ?
I feel there's a lot of different options available but we haven't
had the opportunity to discuss them.
For example, a jenkins job that would trigger periodically or
through content change on the DLRN API result pages ? I'm sure we
could come up with different ideas.
These are the most obvious other options (for me):
1. Make this promotion logic part of DLRN -- I would have preferred this
but didn't see interest in this from Javier or from any of you when I
pitched it a while ago -- probably too late in the design process of the
API. Even the promotion API that we really needed was an afterthought
for DLRN API, so we didn't really cooperate on the design to begin with.
Adding this logic would have been my preference. We should do better
next time.
2. Run this separately but *on* the DLRN server. I couldn't even get a
proper approval for me and Gabriele and Sagi to get submit rights in the
'config' repo after a month[2]. I didn't even try to force this -- seems
like our communication and cooperation is inflexible enough for now to
not try to force this level of cohabitation to save the resources to run
a single VM. I think it makes sense to run this separately.
3. Run this in a zuul job constantly polling: we would use the same
amount of resources as having a dedicated machine, there's no good
reason for doing it.
4. Run this based on some triggers: We want to be able to rerun failed
jobs (RDO Phase1 & Phase2) and have the promotion succeed. We will have
a ton of jobs that would trigger these scripts when they finish and
there's no point in doing it after every single job, time based check
seems to be more useful. If we don't run it after every single job
finishes, it might miss a possible window for promotion.
So in summary, the point of using DLRN API was not relying on random
places changing/triggers/etc. The source of truth should be the DLRN API
for promotion and the most straightforward way to check for results now
is polling.
I wouldn't mind if we eventually integrate this functionality in
DLRN/DLRN API and when the promotion conditions are true, it could
trigger jobs. Though we couldn't trigger stuff on intranet -- polling
there still makes sense, but we could poll some DLRN page for sure.
This script[1] and VM instance is what we have now due to 1) and 2) not
going through. It would be definitely a lot more sane and cheaper
resource and maintenance-wise to do these calculations for promotions in
DLRN API and have DLRN trigger some jobs when they are true for a given
hash, but I didn't feel capable of adding this to DLRN, it was simpler
to use the API as it was designed.
I'm happy to help start integrating this into DLRN, but as of now we
should poll the API, and for polling the most reasonable solution is to
have this on a constantly running machine vs. a long running job that
does polling.
Let me know what you think!
Attila
[1]
https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/dlrnapi_pro...
[2]
https://www.redhat.com/archives/rdo-list/2017-September/msg00008.html