It’s an often overlooked fact that OpenStack has two wholly different mechanisms for provisioning block devices for instances. While most people are aware of the Block Storage service, Cinder, not everyone is aware of or gives much thought to the various types of block devices or “ephemeral” storage that the Compute service, Nova, is able to provide. This article serves to provide a high-level overview and comparison of these two different mechanisms, including examples of how you can use them and some hopefully (interesting) asides.
Types of block storage
Broadly speaking, OpenStack has two types of block storage: storage provisioned by Nova, and storage provisioned by Cinder. Somewhat confusingly, these are often referred to as ephemeral storage and persistent storage, respectively. The origin for these names is likely based on the fact that Nova-provisioned storage is associated with an individual instance and has a lifecycle that is tied to the lifecycle instance itself, meaning if you delete the instance then any Nova-provisioned storage associated with that instance is also deleted. By comparison, Cinder-provisioned storage can “persist” after an instance is deleted and may be re-attached to other instances (or in some cases even attached to multiple instances at the same time).
When and where you can use either type of storage depends on the type of disk you want to create, something we’ll get
to in a moment. However, to call these storage types ephemeral and persistent invites confusion on multiple fronts.
Firstly, there is the fact that Nova already has its own separate concept of “ephemeral disks”, which we cover below.
More pressingly, calling Nova-provisioned storage ephemeral suggests it is somehow unsafe or unreliable. In reality,
both Nova and Cinder provide a level of configurability about where or how the underlying data for the block devices are
stored. Cinder achieves this by being pluggable and supporting a wide variety of drivers, which can be found
here. Cinder supports backends like LVM and Ceph, as well as a large variety of other proprietary and
open source backends. For Nova, this is determined by the virt driver in use, the value of the [compute] use_cow_images
configuration option, and optionally one or more virt driver-specific configuration options. If using
the libvirt driver then the following options are all relevant:
[compute] use_cow_image
,[compute] force_raw_images
,[libvirt] images_types
(and mechanism-specific options),- and
[DEFAULT] instances_path
In a default DevStack deployment, block devices will be stored as qcow2
image files in the path indicated by
[DEFAULT] instances_path
, /opt/stack/data/nova/instances
, on the compute host that instance is located on.
When using this default configuration, a migrated instance will need its block storage files migrated too and if the
host dies then you will lose any of these Nova-provisioned block devices with it. However, this is only one potential
configuration: you can also choose to store the images in Ceph ([libvirt] images_types = rdb
, with [libvirt] images_rbd_pool
and [libvirt] images_rbd_ceph_conf
set to relevant options). Alternatively, you can choose to use use
any of the other local storage mechanisms with a suitable replication or backup strategy, whether that’s LVM ([libvirt] images_types = lvm
and [libvirt] images_volume_group
set to a relevant volume group (VG)) and backups, or one of the
file-based mechanisms with the directory indicated by [DEFAULT] instances_path
placed on a network-attached
filesystem. Suffice to say that, with correct configuration, your data should never be at risk regardless of where it’s
placed.
Types of disk
Now that we’re aware of how the block storage is actually stored, let’s look at how these block devices are exposed and used in OpenStack. OpenStack typically refers to block devices as either disks or volumes. I don’t have a good explanation for when you’d use one or the other, and my own mental shortcut is to call any block device attached to an instance a disk, unless that block device is a non-root device provisioned by Cinder in which case it’s a volume. With that clarified (🙃), let’s take a look at the first disk type, root disks.
Root disks
This is the one most people will be familiar with and it’s the only disk that’s absolutely required. As the name
suggests, the root disk is where /
is mounted. By default, the root disk will be provisioned by Nova and its size will
be configured in GB using the disk
property of the flavor used to create the instance. For example, by looking at the
m1.small
flavor on a local DevStack deployment I can see disk
set to 20
or 20GB:
❯ openstack flavor show m1.small -f value -c disk
20
If we create a instance using this flavor (and no other block device-related configuration, obviously), we will get a root disk that is 20GB in size.
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
test-server
We can confirm this by SSH’ing into the machine:
❯ openstack server add floating ip test-server ${FIP}
❯ openstack server ssh test-server -- -l cirros
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 252:0 0 20G 0 disk
|-vda1 252:1 0 20G 0 part /
`-vda15 252:15 0 8M 0 part
We can also opt to use a Cinder volume for the root disk (which some people will now call a “root volume” - see previous
“clarification” 😅). If you’re using OpenStackClient (OSC), this can be accomplished using the --boot-from-volume
option:
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--boot-from-volume 5 \
test-server
When you do this, Nova will create the volume for you by proxying the request through to Cinder. You can confirm this
using the openstack volume list
command:
❯ openstack volume list
+--------------------------------------+------+--------+------+--------------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+------+--------+------+--------------------------------------+
| 68584677-0d53-4c52-8d1c-5600c96768a1 | | in-use | 5 | Attached to test-server on /dev/vda |
+--------------------------------------+------+--------+------+--------------------------------------+
And, like above, you can confirm the disk size in the guest:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 252:0 0 5G 0 disk
|-vda1 252:1 0 5G 0 part /
`-vda15 252:15 0 8M 0 part
Finally, you can choose to use an existing volume as the root disk. To do this, use the --volume
option:
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--volume test-volume \
test-server
This volume must be bootable, and you would likely use this to re-create a deleted instance using its root volume.
Ephemeral disks
Ephemeral disks are so called because they are associated with a single instance and only exist for the lifetime of that
instance. Therefore, as you may guess, these disks can only be provided by Nova. Like root disks, the size of the
ephemeral disk is configured in GB via a flavor property, OS-FLV-EXT-DATA:ephemeral
. None of the flavors provided in a
default DevStack install provide for ephemeral storage, so to demonstrate this we need to create our own flavor which
we’re going to call m1.ephemeral
:
❯ openstack flavor create --id 99 --vcpus 1 --ram 512 --disk 20 --ephemeral 10 m1.ephemeral
Once created, we can look at the flavor and ensure that the OS-FLV-EXT-DATA:ephemeral
property has been set to the
relevant size (in this case, 10
or 10GB):
❯ openstack flavor show m1.ephemeral -f value -c 'OS-FLV-EXT-DATA:ephemeral'
10
If we create a instance using this flavor, we will get a root disk that is 20GB in size and a new, additional disk that is 10GB in size:
❯ openstack server create \
--flavor m1.ephemeral --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
test-server
Once again, we can confirm this by SSH’ing into the machine:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 252:0 0 5G 0 disk
|-vda1 252:1 0 5G 0 part /
`-vda15 252:15 0 8M 0 part
vdb 252:16 0 10G 0 disk /mnt
You’ll note that the disk is already mounted for us. This isn’t actually an OpenStack doing this for us. Rather, it’s
part of cloud-init
, which is included in the Cirros images. If you were using an image that didn’t include
cloud-init
or were to attach multiple ephemeral disks, you’d need to handle this mounting yourself.
You may also note that there’s an --ephemeral
option available for the openstack server create
command. This option
allows you to change both the layout of the ephemeral storage and the filesystem used on them. For example, say that
instead of having a single 10GB disk, I wanted to have an 8GB disk formatted as ext4 and a 2GB disk formatted as XFS, I
could invoke OSC like so:
❯ openstack server create \
--flavor m1.ephemeral --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--ephemeral size=8,format=ext4 --ephemeral size=2,format=xfs \
test-server
We can see that these are available in the machine using lsblk
again and confirm their filesystem types using blkid
:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 252:0 0 20G 0 disk
|-vda1 252:1 0 20G 0 part /
`-vda15 252:15 0 8M 0 part
vdb 252:16 0 8G 0 disk /mnt
vdc 252:32 0 2G 0 disk
$ blkid
/dev/vdb: LABEL="ephemeral0" UUID="a34015c1-be25-4757-8d17-cc82285b12b2" BLOCK_SIZE="4096" TYPE="ext4"
/dev/vdc: LABEL="ephemeral1" UUID="0cf5d85b-0357-44fc-9549-6f71720af8ba" BLOCK_SIZE="512" TYPE="xfs"
/dev/vda15: SEC_TYPE="msdos" UUID="AE31-5342" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="05ecb6a9-6cb3-4697-b819-25c6cee5bd9a"
/dev/vda1: LABEL="cirros-rootfs" UUID="f1511162-06fb-4482-9dab-9a0c76633fb2" BLOCK_SIZE="4096" TYPE="ext3" PARTUUID="df2f017d-edcc-4371-81c7-fcfa0c5c1b09"
When configuring ephemeral disks this way, the total size of all disks does not have to add up to the size indicted by
the flavor’s OS-FLV-EXT-DATA:ephemeral
property (meaning you could choose only to configure e.g. the 2GB volume above)
but it cannot exceed the size indicated in the flavor. For example, if I repeated the above command but using single
20GB disk (i.e. 10GB larger than the 10GB value specified for the m1.ephemeral
flavor we created previously), the
request will fail:
❯ openstack server create \
--flavor m1.ephemeral --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--ephemeral size=20 \
test-server
BadRequestException: 400: Client Error for url: http://10.0.108.50/compute/v2.1/servers, Ephemeral disks requested are
larger than the instance type allows. If no size is given in one block device mapping, flavor ephemeral size will be
used.
Finally, as we noted at the top, ephemeral disks can only be provisioned by Nova. If you attempt to create an ephemeral
disk using destination_type=remote
, what you have is no longer an ephemeral disk (in the Nova sense of the term)
but rather a volume. Which is a nice segue to…
Volumes
Volumes are not a disk type but rather the name given to the primary resource type provided by Cinder. As noted previously, a Cinder volume can be used as the backing device for the root disk, but it can also be used to provide additional disks to the instance (but not ephemeral disks - an ephemeral disk ceases to be an ephemeral disk once it’s no longer provisioned by Nova). For example, say we wanted to use a standard Nova-provisioned block device for our root disk but attach two additional volumes provisioned by Cinder, we could run:
❯ vol_a_id=$(openstack volume create --size 5 test-volume-a -f value -c id)
❯ vol_b_id=$(openstack volume create --size 5 test-volume-b -f value -c id)
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--block-device uuid=${vol_a_id},source_type=volume \
--block-device uuid=${vol_b_id},source_type=volume \
test-server
We can examine these in the guest:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 252:0 0 20G 0 disk
|-vda1 252:1 0 20G 0 part /
`-vda15 252:15 0 8M 0 part
vdb 252:16 0 5G 0 disk
vdc 252:32 0 5G 0 disk
Instead of pre-creating the volume with Cinder, we could also let Nova do this for us. This avoids an extra call to Cinder (at least from us - Nova simply makes the call on our behalf), at the expense of some fine grained configuration for the volume. We could modify the above command like so:
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--block-device source_type=blank,destination_type=volume,volume_size=5 \
--block-device source_type=blank,destination_type=volume,volume_size=5 \
test-server
Unlike the block devices provisioned by Nova, it’s also possible to attach additional block devices to an existing instance:
❯ openstack volume create --size 5 test-volume-c
❯ openstack server add volume test-server test-volume-c
Likewise, you can remove devices as long as they’re not used for the root disk:
❯ openstack server remove volume test-server test-volume-c
Finally, Cinder volumes can be attached to multiple instances at the same time. As discussed in the Cinder
documentation, this requires the use of a clustered or multi-attach creation of a special volume type
with the multiattach
property set to <is> True
. Fortunately, the reference LVM driver configured by default in a
DevStack deployment supports this, so we can demo creating the volume type, volume, and instances:
❯ openstack volume type create --multiattach multiattach
❯ vol_id=$(openstack volume create --size 5 --type multiattach test-volume -f value -c id)
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--block-device uuid=${vol_id},source_type=volume \
test-server-a
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--block-device uuid=${vol_id},source_type=volume \
test-server-b
If we log in to these two instances and mount the volume (after partitioning and formatting it), we’ll be able to create a file in one and see it appear in the other:
❯ openstack server ssh test-server-a -- -l cirros
# create partitions using fdisk since parted is not available in cirros images
$ sudo fdisk /dev/vdb
$ sudo mkfs.ext4 /dev/vdb1
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 252:0 0 20G 0 disk
|-vda1 252:1 0 20G 0 part /
`-vda15 252:15 0 8M 0 part
vdb 252:16 0 5G 0 disk
`-vdb1 252:17 0 5G 0 part
$ mkdir test
$ sudo mount /dev/vdb1 test
$ sudo touch test/foo
$ exit
❯ openstack server ssh test-server-b -- -l cirros
$ mkdir test
$ sudo mount /dev/vdb1 test # you may need to reboot to pick up the changes to this disk for this to work
$ ls test
$ ls disk/
foo
Swap disks
The final type of disk is the swap disk. Swap disks are very like ephemeral disks, in that they’re exclusively managed
by Nova and configured via a flavor property called swap
. Unlike the disk
and OS-FLV-EXT-DATA:ephemeral
properties
though, the swap
property is a size in MB, not GB. Once again, the default DevStack configuration does not include
flavors with swap enabled so we need to configure these ourselves. Lets do that, creating a flavor called m1.swap
with
2048 MB of swap:
❯ openstack flavor create --id 98 --vcpus 1 --ram 512 --disk 20 --swap 2048 m1.swap
Once created, we can look at the flavor and ensure that the swap
property has been set to the relevant size (in this
case, 1024
or 1024MB):
❯ openstack flavor show m1.swap -f value -c 'swap'
1024
If we create a instance using this flavor, we will get a root disk that is 20GB in size and a additional disk that is 1024MB in size and formatted as a swap disk:
❯ openstack server create \
--flavor m1.swap --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
test-server
Inspecting the guest itself, we see the disk present and that it has an fstype of swap
, as expected:
$ sudo lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
vda
|-vda1 ext3 cirros-rootfs f1511162-06fb-4482-9dab-9a0c76633fb2 18.4G 0% /
`-vda15 vfat AE31-5342
vdb swap 6f7adada-a19f-47e1-915c-20570535a619
$ cat /proc/meminfo | grep -i swap
SwapCached: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
However, unlike ephemeral disks, cloud-init
does not mount this additional disk for us. If you want that, you’ll need
to do so manually:
$ sudo swapon /dev/vdb
$ sudo lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
vda
|-vda1 ext3 cirros-rootfs f1511162-06fb-4482-9dab-9a0c76633fb2 18.4G 0% /
`-vda15 vfat AE31-5342
vdb swap 6f7adada-a19f-47e1-915c-20570535a619
$ cat /proc/meminfo | grep -i swap
SwapCached: 0 kB
SwapTotal: 1048572 kB
SwapFree: 1048572 kB
Like ephemeral disk, there’s also a --swap
option for the openstack server create
command that you can use to
override the default swap size. Once again you cannot exceed the total size given in the flavor, but you also can’t
specify the option multiple times to divide the swap into different disks. For example, if we wanted to configure a
smaller swap disk size of, say, 512MB, we could do:
❯ openstack server create \
--flavor m1.swap --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--swap 512 \
test-server
Attempting to use something larger than the size indicated in the swap
property of the flavor will result in an error:
❯ openstack server create \
--flavor m1.swap --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--swap 2048 \
test-server
BadRequestException: 400: Client Error for url: http://10.0.108.50/compute/v2.1/servers, Swap drive requested is larger
than instance type allows.
Block Device Mappings (BDMs)
Finally, we get to the final piece of the block storage puzzle: Block Device Mappings, or BDMs. BDMs are how Nova
describes mapping of block devices to instances. BDMs are specified during instance creation using the
block_device_mapping_v2
field and are objects with a number of well known fields.
Both the Nova user docs and the Nova API Reference docs provide a good overview of these fields, but there are a few in particular worth discussing here in light of the above.
-
destination_type
This field, which has been referenced earlier in this article, determines what service manages the block devices and therefore where the block device resides. This value can either be
local
(meaning managed by Nova) orremote
(managed by Cinder). -
source_type
This indicates the source of the block devices. This value can one of
blank
,image
,snapshot
, orvolume
, and the docs describe all of these in detail. We’ll explore some different use cases for these shortly. -
boot_index
This is an integer value that indicates the order that the VM will use when attempting to boot from disks. When attaching a volume as a root disk (i.e. boot from volume), you will use a
boot_index
of0
to indicate that the guest OS should boot from that volume from.
With that knowledge in hand, we can look at a few of the examples previously from the perspective of the BDMs used
during instance creation. We’ll do this using the openstack server create
command with the --debug
flag, which
allows us to see the raw requests and responses issued to the server.
First, let’s look a root disks. If we create a “standard” instance using a local root disk, then the
block_device_mapping_v2
field can be (and, if using OSC, will be) empty:
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--debug \
test-server
{
"server":
{
// ...
"flavorRef": "2",
"imageRef": "9acb21b3-0516-459a-9b0a-6357e66ff74a"
// ...
}
}
This is because Nova automatically creates the BDM for us, making it unnecessary to specify. If we want to use boot from volume instead, placing our root disk on a volume, we will need to specify a BDM:
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--boot-from-volume 5 --debug \
test-server
{
"server":
{
// ...
"flavorRef": "2",
"block_device_mapping_v2": [
{
"uuid": "9acb21b3-0516-459a-9b0a-6357e66ff74a",
"boot_index": 0,
"source_type": "image",
"destination_type": "volume",
"volume_size": 5
}
],
"imageRef": ""
}
}
(9acb21b3-0516-459a-9b0a-6357e66ff74a
is the ID of the cirros-0.6.2-x86_64-disk
image we created the server with,
and it’s passed via the uuid
field of the BDM rather than via the imageRef
field)
Next, let’s look at ephemeral disks. As with root disks, you can simply use a relevant flavor and omit the
block_device_mapping_v2
field and Nova will automatically do the right thing. If you wanted to divide the ephemeral
disk into multiple devices though, you’d use a different variation of this field. For example, our prior example of an
8GB ext4 ephemeral disk and a 2GB XFS disk would look like so:
❯ openstack server create \
--flavor m1.ephemeral --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--ephemeral size=8,format=ext4 --ephemeral size=2,format=xfs \
test-server
{
"server":
{
// ...
"flavorRef": "99",
"block_device_mapping_v2": [
{
"uuid": "9acb21b3-0516-459a-9b0a-6357e66ff74a",
"boot_index": 0,
"source_type": "image",
"destination_type": "local",
"delete_on_termination": true
},
{
"boot_index": -1,
"source_type": "blank",
"destination_type": "local",
"delete_on_termination": true,
"volume_size": "8",
"guest_format": "ext4"
},
{
"boot_index": -1,
"source_type": "blank",
"destination_type": "local",
"delete_on_termination": true,
"volume_size": "2",
"guest_format": "xfs"
}
],
"imageRef": "9acb21b3-0516-459a-9b0a-6357e66ff74a"
}
}
Yes, things are starting to get lengthy…
We’ll skip over volumes for a moment and move onto swap volumes. Yet again, if you don’t want to do anything bar use the
swap configured in the flavor then you can omit the block_device_mapping_v2
field, but you’ll need to specify it if
you wish to use a different size:
❯ openstack server create \
--flavor m1.swap --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--swap 512 \
test-server
{
"server": {
// ...
"flavorRef": "98",
"block_device_mapping_v2": [
{
"uuid": "9acb21b3-0516-459a-9b0a-6357e66ff74a",
"boot_index": 0,
"source_type": "image",
"destination_type": "local",
"delete_on_termination": true
},
{
"boot_index": -1,
"source_type": "blank",
"destination_type": "local",
"guest_format": "swap",
"volume_size": 512,
"delete_on_termination": true
}
],
"imageRef": "9acb21b3-0516-459a-9b0a-6357e66ff74a"
}
}
And finally there are volumes. As discussed above, there are huge variety of combinations available here and we’re only going to look at a few of those we previously discussed. Firstly, let’s look at our example of using two precreated volumes:
❯ vol_a_id=$(openstack volume create --size 5 test-volume-a -f value -c id)
❯ vol_b_id=$(openstack volume create --size 5 test-volume-b -f value -c id)
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--block-device uuid=${vol_a_id},source_type=volume \
--block-device uuid=${vol_b_id},source_type=volume \
test-server
{
"server":
{
// ...
"flavorRef": "2",
"block_device_mapping_v2":
[
{
"uuid": "9acb21b3-0516-459a-9b0a-6357e66ff74a",
"boot_index": 0,
"source_type": "image",
"destination_type": "local",
"delete_on_termination": true
},
{
"uuid": "668d9851-b878-4fe5-a244-67585c963a09",
"source_type": "volume",
"destination_type": "volume"
},
{
"uuid": "ed7c9976-9fe6-4ada-93cb-4e60bc2a0a05",
"source_type": "volume",
"destination_type": "volume"
}
],
"imageRef": "9acb21b3-0516-459a-9b0a-6357e66ff74a"
}
}
And now compare this to our example of letting Nova create the volumes for us:
❯ openstack server create \
--flavor m1.small --image cirros-0.6.2-x86_64-disk --network private --key-name test-key \
--block-device source_type=blank,destination_type=volume,volume_size=5 \
--block-device source_type=blank,destination_type=volume,volume_size=5 \
test-server
{
"server":
{
// ...
"flavorRef": "2",
"block_device_mapping_v2":
[
{
"uuid": "9acb21b3-0516-459a-9b0a-6357e66ff74a",
"boot_index": 0,
"source_type": "image",
"destination_type": "local",
"delete_on_termination": true
},
{
"source_type": "blank",
"destination_type": "volume",
"volume_size": "5"
},
{
"source_type": "blank",
"destination_type": "volume",
"volume_size": "5"
}
],
"imageRef": "9acb21b3-0516-459a-9b0a-6357e66ff74a"
}
}
There are loads of other combinations available here, from creating volumes from snapshots to tagging created volumes, but this should be sufficient to demonstrate the general “feel” of BDMs.
Final thoughts
The main takeaway from this article should be that if you want to understand block devices in OpenStack, you’d be well placed to understand BDMs. While they are terse, they do expose the most important aspects of block devices in OpenStack, such as their differing sources and variety of ways that they can be configured. Does that mean that this article is written backwards and should open with a piece on BDMs? Probably, but what fun would that be.