Hi,
Thanks for starting this thread and apologies for not replying sooner.
TL;DR:
By means of "integration" between the two teams, I see this mostly as:
- Let's talk to each other more (specs, discussing what you need, etc)
- Let's do this in the open (this thread is a great start)
- Let's collaborate and maintain a common set of roles and playbooks
to drive the entire infrastructure rather than siloed components
Comments in-line:
I think the questions involve the rdo infra people, especially the
second.
Thanks for involving us, I'm convinced we can find something that'll
be great for everyone.
But in general, I'd like to understand what level of
integration we want between the two groups regarding all the things
related to infrastructure servers.
The way I see things is that the individuals managing and maintaining
the infrastructure are there to support you.
In an ideal world, we have the human and monetary resources to meet
your needs and requirements -- let's say we're not quite there yet but
let's say that increased collaboration between the two teams can only
be beneficial.
We're not there to block anyone or anything, we're there to support you
and provide tooling and infrastructure you can rely on. I like to
pretend we have been steadily improving over the past ~2 years given
the circumstances.
1) We spoke about bastion host. We forgot to create a card for it
during
planning, and yesterday we failed to agree on the scope of the task.
Bastion host is a precise security element, with certain rules which have
to be respected for it to be useful. The bastion host is the single
access point to all the other servers on the infrastructure, it's the
only one possessing a public ip, and that means that *EVERYTHING* needs
to get through it: logs exposure, web pages access (sova included), ssh
access.
Your definition of a bastion host doesn't match what we already use today
and what I felt we tried to convey when we mentioned a "bastion host" during
that last meeting.
What may have been described as a bastion host isn't a bastion host or a
jump box in the literal sense -- at least in the way we are currently
using them.
The upstream infrastructure team has a server called
"puppetmaster.openstack.org" which, ironically, no longer runs the puppetmaster
service but runs Ansible playbooks on a cron instead.
These Ansible playbooks end up doing local puppet apply on the different
servers of the infrastructure but let's not focus on that for now.
The general idea is that there is this central "bastion" host that,
yes, has access
to everything (by necessity) to deploy things on them through Ansible.
However, system administrators ("infra-root") have named users with sudo
privileges directly on each node -- for example I can login as
dmsimard(a)zuul.openstack.org.
Software Factory has a similar mechanism through the "managesf" node which
also leverages Ansible to deploy and configure the different nodes of the
deployment.
This is what we may have described as a "bastion" host but not in the literal
sense -- I hope this clarifies things.
I don't personally think that a literal jump box or bastion host is necessary
for our needs.
Let's try to keep things as simple as possible to make our lives and maintenance
easier.
While beginning working on this card, yesterday I asked a question
to
dmsimard, and it showed me the common roles that are already in place to
setup RDO infra servers. These common roles cover 70-80% of this card
already.
Yes, as per mentioned in different places (most recently in this review[1]), we
already have standardized roles to bootstrap standardized things across all of
our servers such as users, ssh keys and monitoring.
These roles should ideally be ran on every single server.
The question is: what level of integration we want to achieve
between
our two groups ? Can we modify these roles to be a bit more generic so
they can be used in setting up our infrastructure too ?
We are in the process of merging all our playbooks and roles back into
rdo-infra-playbooks [2] which is already integration tested under code
review [3].
The next step would be to formalize the notion of a central host that
continuously
deploy each server with the latest code.
The benefit to this is that anyone can submit a change, it will be tested,
reviewed and deployed automatically.
No one /needs/ access to the servers to install them, they'd be automatically
installed and configured through this mechanism.
This can also help provide a certain amount of guarantee that servers have not
been manually tempered with which can lead to a lot of confusion.
Can we for example modify the log server to accept logs from remote
journald in our servers, so we can already use the log server to expose our
promotion logs ?
That said, I think our existing roles are generic enough -- especially the base
and monitoring roles. If you have improvements for these, feel free to submit
them !
For this particular case, the logserver role that exists today [4] is
actually not
the one that was used to deploy our current logserver implementation which
is more in line with a WIP patch here [5].
I think we need to work on describing what we need better, agree on a solution
and then go from there.
In this case, our need is to expose the promotion logs -- this can be provided
in a number of ways ranging from our existing logstash setup to simply
doing a "passthrough" of the logs over a local httpd server.
In the upstream infrastructure, these kind of proposals are discussed as
specs [6]. It allows everyone a chance to give their opinion which not only
results in a better spec in the end but it also allows to avoid starting with
bad assumptions.
For example, you might draft a spec to deploy a new component in the
infrastructure but it turns out we already have something exactly like that but
you didn't know about it.
At the end of the day, let's not forget that RDO is an open source
community that anyone can participate in just like the upstream OpenStack
one. You can show up tomorrow in #openstack-infra and start sending and
reviewing patches if you'd like.
The upstream infra-root and config-core teams are more or less volounteers
that have accepted to take on this role regardless of their organization or
affiliation. They do not really have any privileged or closed channels.
Everything is in the open, everyone can contribute.
What I'm trying to say is that if someone from outside of Red Hat comes and
volunteers to help with the management of the RDO infrastructure, I'm definitely
not going to turn that person down.
So the scope here is not so much "integration between two teams" but instead
looking at the opportunities to have everyone collaborating to improve the
infrastructure in general.
Hope that makes sense.
[1]:
https://review.rdoproject.org/r/#/c/12777/1/ci-scripts/infra-setup/roles/...
[2]:
https://github.com/rdo-infra/rdo-infra-playbooks
[3]:
https://review.rdoproject.org/r/#/q/project:rdo-infra/rdo-infra-playbooks
[4]:
https://github.com/rdo-infra/ansible-role-logserver
[5]:
https://review.rdoproject.org/r/#/c/10814/
[6]:
https://github.com/openstack-infra/infra-specs
David Moreau Simard
Senior Software Engineer | OpenStack RDO
dmsimard = [irc, github, twitter]