Availability Zones in Openstack and Openshift (Part 1)

After seeing a few too many availability zone-related issues popping up in OpenShift clusters of late, I’ve decided it might make sense to document the situation with OpenStack AZs on OpenShift (and, by extension, Kubernetes). This is the first of two parts. This part provides some background on what AZs are and how you can configure them, while the second part examines how AZs affect OpenShift and Kubernetes components such as the OpenStack Machine API Provider, the OpenStack Cluster API Provider, and the Cinder and Manila CSI drivers.

Background

Both the Compute (Nova) and Block Storage (Cinder) services in OpenStack support the concept of Availability Zones (AZs) and the envisioned use cases is very similar for both. Quoting from the Nova documentation:

Availability Zones are an end-user visible logical abstraction for partitioning a cloud without knowing the physical infrastructure. They can be used to partition a cloud on arbitrary factors, such as location (country, datacenter, rack), network layout and/or power source.

The Nova documentation then goes on to specifically note that the AZ feature provides no HA benefit in and of itself - whatever benefits there are are entirely down to how the deployment is designed - thus it’s really just a way to signal something you’ve done in your physical deployment. All of this is equally true of both Nova and Cinder, and in my experience I’ve seen AZs used to demarcate both compute and block storage nodes existing on different racks or in different datacenters.

Configuring AZs for hosts

As you might expect, Cinder AZ’s are an attribute of the block storage hosts (i.e. hosts running the cinder-volume service). As discussed later, you can configure a host’s AZ by setting the [DEFAULT] storage_availability_zone configuration option in cinder.conf. By comparison, Nova’s AZs are not typically configured via nova.conf but are actually attributes of host aggregates and can be configured by setting the availability_zone metadata key of an aggregate. If a compute host (i.e. a host running the nova-compute service) belongs to a host aggregate with the AZ metadata key set then the host will inherit the AZ of that host aggregate. It’s only when a host doesn’t belong to a host aggregate - or none of the host aggregates it belongs to have AZ metadata set - that this information will be sourced from elsewhere, namely the [DEFAULT] default_availability_zone config option described below. Unlike Cinder’s config option, this is not intended to differ by host and should be set to the same value across all compute nodes. Nova will prevent you adding a host to more than one aggregate with AZ metadata set since a host can only belong to one AZ.

❯ openstack aggregate create --zone nova-az1 foo
❯ openstack aggregate create --zone nova-az2 bar
❯ openstack aggregate add host foo stephenfin-devstack
❯ openstack aggregate add host bar stephenfin-devstack
ConflictException: 409: Client Error for url: http://10.0.109.204/compute/v2.1/os-aggregates/13/action, Cannot add host to aggregate 13. Reason: One or more hosts already in availability zone(s) ['nova-az1'].

In addition, if a host has instances on it, the Nova will also prevent you from modifying the AZ metadata of an aggregate it already belongs to - since this would break the AZ constraint placed on any of the existing instances:

❯ openstack server show test-server -f value -c OS-EXT-AZ:availability_zone -c OS-EXT-SRV-ATTR:host
nova-az1
stephenfin-devstack
❯ openstack aggregate show foo -f value -c availability_zone -c hosts
nova-az1
['stephenfin-devstack']
❯ openstack aggregate set --zone nova-az2 foo
BadRequestException: 400: Client Error for url: http://10.0.109.204/compute/v2.1/os-aggregates/12, Cannot update aggregate 12. Reason: One or more hosts contain instances in this zone.

Requesting AZs for resources (servers, volumes, volume backups, …)

Nova allows you to specify an AZ when creating an instance (or “server”, in OpenStackClient parlance), while Cinder allows you to specify them when creating a volume, a volume backup, a volume group, or (volume groups’ deprecated predecessor) a consistency group. For example, to create an instance (or “server”) with an explicit compute AZ:

openstack server create --availability-zone compute-az1 ...

Likewise, to create a volume and volume backup with an explicit block storage AZ:

openstack volume create --availability-zone volume-az1 ...
openstack volume backup create --availability-zone volume-az2 ...

However, you’ll note that these resource types will always have AZ information associated with them, even when an AZ wasn’t specifically requested during creation. This is because, in the absence of specific AZ information, both services will default to setting the AZ of the resource to the AZ of the host that the resource was created on. Put another way, if I create instance my-server with no AZ information and it ends up on host my-host, then my-server will inherit the AZ of my-host. Block storage resources work in the same way, meaning volume my-volume will inherit the AZ of the host it is scheduled to. As a result, there has historically been no way for an end-user to tell if an AZ was explicitly requested when creating a server or not. In fact, the only way they will find out is if they try to move the server since Nova will insist of moving the instance to another host within the same AZ (this wouldn’t happen for a server that wasn’t explicitly created in an AZ). As we’ll touch on in part 2, this has been rather frustrating from an OpenShift or Kubernetes perspective since Kubernetes’ topology feature is a hard requirement and it does not like us changes the AZ-related labels of Node or Machine objects, which can happen when you migrate the underlying server and the server picks up the AZ of the new host. Fortunately, the 2024.1 (Caracal) release of OpenStack introduced a new field to the GET /servers/{serverID} response called pinned_availability_zone which will show the AZ requested during initial instance creation, if set and it’s just a matter of time before we’re able to start consuming this in the various OpenShift and Kubernetes components.

Combining Nova and Cinder’s AZ features

Finally, it’s worth exploring the interplay of the Nova and Cinder AZ features since this will be particularly relevant in part 2. In a Hyperconverged Infrastructure (HCI) deployment, where compute and block storage services run side-by-side on hyperconverged hosts, the compute hosts are the block storage hosts and there is no difference between the AZs. In a non-HCI deployment, this is unlikely to be the case but this hasn’t prevented people and applications from frequently munging the two types of AZ, as we will see later. Because this conflation of different AZ types can happen, the general expectation we would have is that one of the following is true:

There is only a single compute AZ, a single block storage and they have the same name. This is the default configuration if you use “stock” OpenStack: Nova’s default AZ is nova and Cinder helpfully defaults to the same value.
There are multiple compute and block storage AZs, but there is the same number of both and they share the same name. For example, both the compute and block storage services have the following AZs defined: AZ0, AZ1, and AZ2. In this case, users and applications which incorrectly use compute host AZ information to configure the AZ of volumes and related block storage resources will “just work”.
There are multiple compute and block storage AZs, and there is either a different number of each or they have different names. For example, the compute services have the compute-az0 and compute-az1 AZs defined while the block storage services have the volume-az0 and volume-az1 AZs defined. In this case, the users and applications must be very careful to explicitly specify a correct AZ when creating volumes and related block storage resources and must ensure Nova is configured to allow attaching volumes in other AZs (more of this later too).

The last case above isn’t helped by the fact that neither Nova nor Cinder provide an API to request the correct block storage AZ for a given compute host. To be fair, such an API would likely be a rather difficult thing to do, given multiple backends are a thing to be considered. It would be effectively impossible to do automatically, meaning there would still be initial manual configuration required. The closest analog we have for his today is the Volume Type AZ feature, which allows you to indicate the AZs that can be used when creating a volume with a given volume type (so that e.g. a particular block storage backend that is only available to one rack can’t be requested by volumes hosted by block storage services running on another rack). As the docs for that indicate, this configuration is entirely deployment specific and therefore totally manual.

Wrap up

That concludes part 1 of this OpenShift-centric examination of OpenStack Availability Zones. In this part we focused almost exclusively on OpenStack itself, looking at what AZs are, how they’re configured and used, and the various issues people are likely to encounter along the way, but in part 2 we’re going to turn our focus to how OpenStack AZs are consumed and represented by OpenShift components when an OpenShift cluster is deployed on an OpenStack cloud. Stay tuned!

Reference

Configuration

Since this feature exists across two services, there are two sets of configuration options to be concerned with.

As of the 2023.1 (Antelope) release, Nova has three relevant configuration options:

[DEFAULT] default_availability_zone defines the default AZ of each compute host, which can be changed by adding the host to a host aggregate and setting the special availability_zone metadata property as described in the nova docs. This option defaults to nova and as noted in the nova docs, the default AZ should never explicitly requesting this AZ when creating new instances since it will prevent migration of instance between different hosts in different AZs (which is allowed by default if the AZ was unset during initial creation) as well as identification of hosts that are missing AZ information. You have been warned.
[DEFAULT] default_schedule_zone defines the default AZ that should be assigned to an instance on creation. If this is unset, the instance will be assigned an implicit AZ of the host it lands on. You might want to use this if you wanted the majority of instances to go into a “generic” AZ while special instances can go into specific AZs.
[cinder] cross_az_attach determines whether volumes are allowed to be attached to an instance if the instance host’s compute AZ differs from that of the volume’s block storage AZ. It also determines whether volumes created when creating a boot-from-volume server have an explicit AZ associated with them or not. This defaults to true and with good reason, given the aforementioned caveats around munging of compute and block storage AZs and the need for them to be identical.

There is also the [DEFAULT] internal_service_availability_zone configuration option, but this has no real impact for end-users.

As of the 2023.1 (Antelope) release, Cinder has four relevant configuration options:

[DEFAULT] storage_availability_zone defines the default AZ of the block storage host. This defaults to nova and can be overridden on a per-backend basis using [foo] backend_availability_zone. Speaking of which…
[foo] backend_availability_zone define the default AZ for a specific backend of the block storage host. foo should be the name of the volume backend, as defined in [DEFAULT] enabled_backends.
[DEFAULT] default_availability_zone defines the default AZ that should be assigned to a volume on creation. If this is unset, the volume will be assigned the AZ of the host it lands on (which in turn defaults to [DEFAULT] storage_availability_zone, per above).
[DEFAULT] allow_availability_zone_fallback allows you to ignore an request for an invalid block storage AZ and instead fallback to the default AZ defined in [DEFAULT] default_availability_zone. This defaults to false, though to be honest true is probably a sensible value for configurations where e.g. there are multiple compute AZs and a single volume AZ.

Usage

Again, since this feature exists across two services, there are two sets of resource types to be concerned with.

To configure the AZ of a compute host, you configure AZ information for a host aggregate and then add the host to this aggregate.

❯ openstack aggregate create --zone nova-az1 foo
❯ openstack aggregate add host foo stephenfin-devstack

Once this is done, you can request the AZ when creating an instance (or “server”):

❯ openstack server create --availability-zone nova-az1 ...

On the other hand, the AZ of a storage host is configured via config and there’s no API method to configure it. You can use it when creating a volume just like creating a server though:

openstack volume create --availability-zone volume-az1 ...

Or when creating a volume backup:

openstack volume backup create --availability-zone volume-az2 ...

Other API libraries like gophercloud also expose these attributes and allow them to be configured, but we won’t go into that here.