Recent versions of Kubernetes have begun moving functionality that previously existed in the core project out to separate projects. One such set of functionality is the cloud provider-specific code, which is now handled by the Cloud Controller Manager project. This is well described in the Kubernetes documentation.
In the 4.12 release, we hope to switch OpenShift deployments running on OpenStack clouds from the legacy OpenStack cloud provider to the external OpenStack cloud provider, OpenStack Cloud Controller Manager (OCCM). There are a couple of steps needed to make this happen, one of which is taking user-provided configuration for the legacy cloud provider and mapping it to configuration for the shiny new external cloud provider. This is necessary to ensure any user-provided configuration is retained and the upgrade doesn’t break the deployment. In the case of the OpenStack provider, this configuration is INI-style and thankfully quite similar for both the legacy and external cloud provider implementations.
To handle the migration of configuration in OpenShift deployments, we are relying on the Cluster Cloud Controller Manager Operator (CCCMO). This operator is already responsible for managing the lifecycle of CCM on OpenShift deployments, including configuration of CCM, so naturally it is a good fit for this kind of task. A detailed description of the changes we ultimately made, along with motivation for same, can be found in this enhancement (the pull request itself is probably quite helpful also, if you read Go) but I hope to explain them at a high level here since the paradigms used are similar to those found in other operators and are being used to manage other complex upgrades, such as the switch from in-tree block storage drivers to Cluster Storage Interface (CSI) drivers.
How CCCMO generates configuration
The first step in understanding how CCCMO can be used to manage the migration of configuration is to examine how CCCMO sources configuration - specifically user-provided configuration - and uses this to generate the configuration actually used for CCM. Once we understand this, we can decide at what points to hook in and customise or translate this user-provided configuration. We can also use this model in other operators. Thankfully, in the case of CCCMO this sourcing and generation of configuration is pretty simple.
Firstly, the operator attempt to retrieve config from the
openshift-config-managed / kube-cloud-config config map:
$ oc get cm/kube-cloud-config -n openshift-config-managed -o yaml apiVersion: v1 data: cloud.conf: | [Global] secret-name = openstack-credentials secret-namespace = kube-system region = regionOne [LoadBalancer] use-octavia = True kind: ConfigMap metadata: creationTimestamp: "2022-02-25T17:01:58Z" name: kube-cloud-config namespace: openshift-config-managed resourceVersion: "3853" uid: c23c14b7-66db-431c-a723-59439f946f80
This can be seen here.
The reason that it searches for this config map specifically is historical:
this is the config map generated by the Cluster Config Operator (CCO),
which is used to configure the legacy cloud provider (among other things). CCO
manipulates user-provided configuration for some cloud providers (specifically
AWS and Azure) so I guess the idea here was to avoid re-implementing this
transformation logic in CCCMO. Everything in the
namespace is owned by CCO and is not intended to be modified by a user (in
fact, attempts to modify it will likely be futile and the operator will quickly
erase those changes).
If the lookup of the
openshift-config-managed / kube-cloud-config config map
fails, we attempt to retrieve configuration from the
openshift-config / cloud-provider-config config map:
$ oc get cm/cloud-provider-config -n openshift-config -o yaml apiVersion: v1 data: config: | [Global] secret-name = openstack-credentials secret-namespace = kube-system region = regionOne [LoadBalancer] use-octavia = True kind: ConfigMap metadata: creationTimestamp: "2022-02-25T17:00:15Z" name: cloud-provider-config namespace: openshift-config resourceVersion: "1802" uid: 45bda3c8-8866-4aea-92be-921502ff2055
This can be seen here.
Once again, the reason we use this config map is historical and is based on
what CCO uses. While things in the
openshift-config-managed namespace are
not user editable, the
openshift-config namespace is the namespace for
“user-managed” configuration or configuration that things like operators are
not allowed to modify.
If the both lookup fails, we error out. However, this is unlikely since the
installer should create it as seen
Assuming one of them does exist, we sync whatever we found to the
openshift-cloud-controller-manager / cloud-conf config map:
$ oc get cm/cloud-conf -n openshift-cloud-controller-manager -o yaml apiVersion: v1 data: cloud.conf: | [Global] secret-name = openstack-credentials secret-namespace = kube-system region = regionOne [LoadBalancer] use-octavia = True kind: ConfigMap metadata: creationTimestamp: "2022-02-25T17:01:08Z" name: cloud-conf namespace: openshift-cloud-controller-manager resourceVersion: "2519" uid: cbbeedaf-41ed-41c2-9f37-4885732d3677
This can be seen here.
In this instance, the namespace isn’t actually locked in. It is possible to
cluster-controller-manager-operator binary with a
argument and this option defaults to
How CCCMO handles upgrades
(for OpenStack clouds on OpenShift 4.11 or later)
So now that we understand how CCCMO sources user-provided configuration and generates the resulting configuration used by Cloud Controller Manager, it’s time to examine how we’ve decided to handle the migration of configuration for legacy cloud providers to configuration suitable for external cloud provides. As noted above, previously CCCMO took user-provided configuration from a config map in one namespace and copied it to a config map in another namespace. It should be pretty obvious that there’s no reason this copy has to be a straightforward copy: we could modify the input config map before we dump it back out. This is of course exactly what we did.
Starting with the upcoming OpenShift 4.11 release, CCCMO provides configuration “transformers”. Transformers simply load configuration provided by users, do some basic validation, and then transform things by dropping options that are no longer relevant, adding options that are now necessary, and renaming or modifying options that have changed between the legacy. This idea isn’t particularly novel - as noted previously, CCO was already doing something very similar for AWS and Azure - but it works. Annoyingly these transformers must be cloud-specific since the CCM binary used for each cloud provider expects a radically different configuration files (in the case of the OpenStack cloud provider this is an INI-style configuration file while Azure expects a YAML-formatted file). As a result, we have only implemented the OpenStack transformer for now. However, in the future we will likely implement additional transformers for at least AWS and Azure since as noted previously CCO is already doing some transformation here.
Specifically, the transformer for OpenStack clouds in CCCMO currently does the following:
[Global] secret-namespace, and
[Global] kubeconfig-pathoptions, since these aren’t applicable for the external cloud provider (the first two are OpenShift-only modifications). This inline configuration has been replaced by configuration stored in a
clouds.yamlfile. Speaking of which…
[Global] clouds-file, and
Drops the entire
[BlockStorage]section since external cloud providers are no longer responsible for anything storage’y (this is now handled by Cluster Storage Interface (CSI) drivers, including the Manila CSI driver and Cinder CSI driver)
Adds or sets the
[LoadBalancer] enabledoptions, depending on the specific deployment configuration (i.e. is Kuryr in use?)
All of this can be seen here
Hopefully this helps shine a little light on how CCCMO (and to a lesser degree,
CCM and CCO) works and operates, at least from a OpenStack perspective. For
most users, none of the above should matter: the OpenShift documentation
describes how configuration of the cloud provider, be it internal or external,
should happen via the
openshift-config / cloud-provider-config config map and
all of this transformation logic should be effectively invisible. However, when
things go wrong, it can be helpful to know in which dark corners to look 😄