Handling the switch to the Cloud Controller Manager (CCM) with OpenShift Operators

Recent versions of Kubernetes have begun moving functionality that previously existed in the core project out to separate projects. One such set of functionality is the cloud provider-specific code, which is now handled by the Cloud Controller Manager project. This is well described in the Kubernetes documentation.

In the 4.12 release, we hope to switch OpenShift deployments running on OpenStack clouds from the legacy OpenStack cloud provider to the external OpenStack cloud provider, OpenStack Cloud Controller Manager (OCCM). There are a couple of steps needed to make this happen, one of which is taking user-provided configuration for the legacy cloud provider and mapping it to configuration for the shiny new external cloud provider. This is necessary to ensure any user-provided configuration is retained and the upgrade doesn’t break the deployment. In the case of the OpenStack provider, this configuration is INI-style and thankfully quite similar for both the legacy and external cloud provider implementations.

To handle the migration of configuration in OpenShift deployments, we are relying on the Cluster Cloud Controller Manager Operator (CCCMO). This operator is already responsible for managing the lifecycle of CCM on OpenShift deployments, including configuration of CCM, so naturally it is a good fit for this kind of task. A detailed description of the changes we ultimately made, along with motivation for same, can be found in this enhancement (the pull request itself is probably quite helpful also, if you read Go) but I hope to explain them at a high level here since the paradigms used are similar to those found in other operators and are being used to manage other complex upgrades, such as the switch from in-tree block storage drivers to Cluster Storage Interface (CSI) drivers.

How CCCMO generates configuration

The first step in understanding how CCCMO can be used to manage the migration of configuration is to examine how CCCMO sources configuration - specifically user-provided configuration - and uses this to generate the configuration actually used for CCM. Once we understand this, we can decide at what points to hook in and customise or translate this user-provided configuration. We can also use this model in other operators. Thankfully, in the case of CCCMO this sourcing and generation of configuration is pretty simple.

Firstly, the operator attempt to retrieve config from the openshift-config-managed / kube-cloud-config config map:

$ oc get cm/kube-cloud-config -n openshift-config-managed -o yaml
apiVersion: v1
data:
  cloud.conf: |
    [Global]
    secret-name = openstack-credentials
    secret-namespace = kube-system
    region = regionOne
    [LoadBalancer]
    use-octavia = True
kind: ConfigMap
metadata:
  creationTimestamp: "2022-02-25T17:01:58Z"
  name: kube-cloud-config
  namespace: openshift-config-managed
  resourceVersion: "3853"
  uid: c23c14b7-66db-431c-a723-59439f946f80

This can be seen here. The reason that it searches for this config map specifically is historical: this is the config map generated by the Cluster Config Operator (CCO), which is used to configure the legacy cloud provider (among other things). CCO manipulates user-provided configuration for some cloud providers (specifically AWS and Azure) so I guess the idea here was to avoid re-implementing this transformation logic in CCCMO. Everything in the openshift-config-managed namespace is owned by CCO and is not intended to be modified by a user (in fact, attempts to modify it will likely be futile and the operator will quickly erase those changes).

If the lookup of the openshift-config-managed / kube-cloud-config config map fails, we attempt to retrieve configuration from the openshift-config / cloud-provider-config config map:

$ oc get cm/cloud-provider-config -n openshift-config -o yaml
apiVersion: v1
data:
  config: |
    [Global]
    secret-name = openstack-credentials
    secret-namespace = kube-system
    region = regionOne
    [LoadBalancer]
    use-octavia = True
kind: ConfigMap
metadata:
  creationTimestamp: "2022-02-25T17:00:15Z"
  name: cloud-provider-config
  namespace: openshift-config
  resourceVersion: "1802"
  uid: 45bda3c8-8866-4aea-92be-921502ff2055

This can be seen here. Once again, the reason we use this config map is historical and is based on what CCO uses. While things in the openshift-config-managed namespace are not user editable, the openshift-config namespace is the namespace for “user-managed” configuration or configuration that things like operators are not allowed to modify.

The name of this config map (as opposed to the namespace) is actually cloud/infrastructure dependent and this is simply the OpenStack name. It is defined as an attribute on the cluster infrastructure resource.

$ oc get infrastructure/cluster -o jsonpath="{.spec.cloudConfig.name}"
cloud-provider-config

If the both lookup fails, we error out. However, this is unlikely since the installer should create it as seen here. Assuming one of them does exist, we sync whatever we found to the openshift-cloud-controller-manager / cloud-conf config map:

$ oc get cm/cloud-conf -n openshift-cloud-controller-manager -o yaml
apiVersion: v1
data:
  cloud.conf: |
    [Global]
    secret-name = openstack-credentials
    secret-namespace = kube-system
    region = regionOne
    [LoadBalancer]
    use-octavia = True
kind: ConfigMap
metadata:
  creationTimestamp: "2022-02-25T17:01:08Z"
  name: cloud-conf
  namespace: openshift-cloud-controller-manager
  resourceVersion: "2519"
  uid: cbbeedaf-41ed-41c2-9f37-4885732d3677

You can list all config maps in a namespace like this using:

$ oc get cm -n openshift-cloud-controller-manager \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
ccm-trusted-ca
cloud-conf
kube-root-ca.crt
openshift-service-ca.crt

This can be seen here. In this instance, the namespace isn’t actually locked in. It is possible to configure the cluster-controller-manager-operator binary with a --namespace argument and this option defaults to openshift-cloud-controller-manager, as seen here.

How CCCMO handles upgrades

(for OpenStack clouds on OpenShift 4.11 or later)

So now that we understand how CCCMO sources user-provided configuration and generates the resulting configuration used by Cloud Controller Manager, it’s time to examine how we’ve decided to handle the migration of configuration for legacy cloud providers to configuration suitable for external cloud provides. As noted above, previously CCCMO took user-provided configuration from a config map in one namespace and copied it to a config map in another namespace. It should be pretty obvious that there’s no reason this copy has to be a straightforward copy: we could modify the input config map before we dump it back out. This is of course exactly what we did.

Starting with the upcoming OpenShift 4.11 release, CCCMO provides configuration “transformers”. Transformers simply load configuration provided by users, do some basic validation, and then transform things by dropping options that are no longer relevant, adding options that are now necessary, and renaming or modifying options that have changed between the legacy. This idea isn’t particularly novel - as noted previously, CCO was already doing something very similar for AWS and Azure - but it works. Annoyingly these transformers must be cloud-specific since the CCM binary used for each cloud provider expects a radically different configuration files (in the case of the OpenStack cloud provider this is an INI-style configuration file while Azure expects a YAML-formatted file). As a result, we have only implemented the OpenStack transformer for now. However, in the future we will likely implement additional transformers for at least AWS and Azure since as noted previously CCO is already doing some transformation here.

Specifically, the transformer for OpenStack clouds in CCCMO currently does the following:

Drops the [Global] secret-name, [Global] secret-namespace, and [Global] kubeconfig-path options, since these aren’t applicable for the external cloud provider (the first two are OpenShift-only modifications). This inline configuration has been replaced by configuration stored in a clouds.yaml file. Speaking of which…
Adds the [Global] use-clouds, [Global] clouds-file, and [Global] cloud options.
Drops the entire [BlockStorage] section since external cloud providers are no longer responsible for anything storage’y (this is now handled by Cluster Storage Interface (CSI) drivers, including the Manila CSI driver and Cinder CSI driver)
Adds or sets the [LoadBalancer] use-octavia and [LoadBalancer] enabled options, depending on the specific deployment configuration (i.e. is Kuryr in use?)

All of this can be seen here

Summary

Hopefully this helps shine a little light on how CCCMO (and to a lesser degree, CCM and CCO) works and operates, at least from a OpenStack perspective. For most users, none of the above should matter: the OpenShift documentation describes how configuration of the cloud provider, be it internal or external, should happen via the openshift-config / cloud-provider-config config map and all of this transformation logic should be effectively invisible. However, when things go wrong, it can be helpful to know in which dark corners to look 😄