Comparing Nova Database Migrations

Image by ruchindra / Unsplash

One of the goals for the Wallaby release of OpenStack Nova is to compact many of the database migrations that have been slowly building up since the Icehouse release some 6½ years ago. We used to do this regularly but stopped based on operator feedback suggesting it made upgrades harder than necessary. However, things have changed since then and the amount of database migrations in each release has slowed considerably. In fact, the latest release, Victoria, contained no database migrations whatsoever. This change, coupled with the fact that we’re still using the effectively dead sqlalchemy-migrate library rather than something like alembic means we now have good reason to kick off the compaction.

Below are my notes on this exercises, which demonstrate how to use the current migrations without using nova-manage and everything it entails. This should allow people to test the changes we’re making locally and might even help people interested in writing their own migrations in the future.

Overview

When compressing migrations, the expectation is that the before and after of the migration should be identical. Nova doesn’t have any built-in tests to do this (why would it) so we’re going to do this manually. The steps to do this are effectively:

  1. Create a new empty database.
  2. Apply migrations N to M from current master, where N is the current base migration and M is the migration you wish to compact up to.
  3. Dump the schema for the database.
  4. Drop and recreate the database, then apply the compaction patch.
  5. Apply the new base migration.
  6. Dump the schema for the database.
  7. Compare the schemas from steps 3 and 6, looking for any serious changes.

These steps are implementation differently depending on the RDBMS in use, so sample steps for both SQLite and MySQL are provided below. These already assume you have your system configured for nova development and can successfully run unit tests using tox. You should also have SQLite and MySQL packages installed. For all cases, we’re going to use a virtualenv to ensure the required dependencies are installed so do that first:

$ cd nova
$ virtualenv .venv
$ source .venv/bin/activate
$ pip install \
    -c https://releases.openstack.org/constraints/upper/master \
    -r requirements.txt -r test-requirements.txt
$ pip install -e .

You should also navigate to the migration repository in nova, to avoid having to manually specify this for the various sqlalchemy-migrate commands.

$ cd nova/db/sqlalchemy/migrate_repo

SQLite

Since SQLite databases are stored as a single file, there is no additional setup necessary. SQLite also provides a handy tool to compare databases, sqldiff, which is packaged on Fedora 32 at least. Install that first:

$ sudo dnf install sqlite-tools

With that installed, we can get right to generating two databases that we can then compare using this sqldiff tool. It would of course be possible to dump the schemas and compare those, but doing so means we’d have to filter out things like simple ordering changes that SQLite seems prone to.

Generating the database using sqlalchemy-migrate is a two steps process. We first need to mark our database as version controlled, which will create the necessary version table. The name of this version table is configurable in the migrate.cfg configuration file found in your migration repository and is called migrate_version in nova. The version_control command will create this table and initialize it for version N, which is the base migration you wish to test. For example:

$ python manage.py version_control --database 'sqlite:///nova_before.db' --version 215

Once this is configured, apply all migrations up to M. You can use the upgrade command for this. For example:

$ python manage.py upgrade --database 'sqlite:///nova_before.db' --version 216

Once we have a database instance using the before schema, we can apply the code change and repeat the steps above, this time creating a nova_after.db database. For example:

$ python manage.py version_control --database 'sqlite:///nova_after.db' --version 215
$ python manage.py upgrade --database 'sqlite:///nova_after.db' --version 216

Now, with these two databases to hand, we an diff them to ensure nothing has changed:

$ sqldiff --schema nova_before.db nova_after.db

MySQL

The steps for MySQL are quite similar to those for SQLite, but we do need to do some pre-work - namely, creating a suitable database and user - before we can get to validating schemas. Do this now. For example:

$ mysql
MariaDB [(none)]> CREATE DATABASE nova_before;
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_before.* TO 'nova'@'localhost' IDENTIFIED BY 'password';
MariaDB [(none)]> quit;

With this created, the steps are similar to those for SQLite. Once again, we will use the version_control and upgrade management commands:

$ python manage.py version_control \
    --database 'mysql+pymysql://nova:password@localhost/nova_before' \
    --version 215
$ python manage.py upgrade \
    --database 'mysql+pymysql://nova:password@localhost/nova_before' \
    --version 216

However, unlike SQLite, we have a specific tool available to dump the DB schemas - mysqldump. Use that to dump the before schema:

$ mysqldump --no-data --skip-comments -u nova -ppassword nova_before > nova_before.sql

Once we have a database instance using the before schema, we can apply the code change and repeat the steps above, this time creating a nova_after database. For example:

$ mysql
MariaDB [(none)]> CREATE DATABASE nova_after;
MariaDB [(none)]> GRANT ALL PRIVILEGES ON nova_after.* TO 'nova'@'localhost' IDENTIFIED BY 'password';
MariaDB [(none)]> quit;
$ python manage.py version_control --database 'mysql+pymysql://nova:password@localhost/nova_after' --version 215
$ python manage.py upgrade --database 'mysql+pymysql://nova:password@localhost/nova_after' --version 216
$ mysqldump --no-data --skip-comments -u nova -ppassword nova_after > nova_after.sql

Finally, diff the two schemas to identify any differences:

$ diff nova_before.sql nova_after.sql

PostgreSQL

PostgreSQL exists in a funny space in nova, where it’s technically supported but not many people use it. Nonetheless, there are PostgreSQL-specific migrations in-tree so we must test them.

Once again, this is quite similar to the above, though Postgres’ user model makes things a little more complicated. You need to create a user to run the operations as, but you can’t simply create this as Postgres defaults to the ident auth (local UNIX user) scheme rather than password (or, more specifically, md5). You need to configure pg_hba.conf to enable DB-specific users and create e.g. a nova user with a password password.

Once the user is created, you can create the DB using the postgres tool:

$ sudo -u postgres dropdb --if-exists nova_before
$ sudo -u postgres createdb --owner=nova nova_before

With the DB created, the steps to generate the schema dump are once again quite similar to SQLite and MySQL. Using version_control and upgrade management commands once more:

$ python manage.py version_control \
    --database 'postgresql://nova:password@localhost/nova_before' \
    --version 215
$ python manage.py upgrade \
    --database 'postgresql://nova:password@localhost/nova_before' \
    --version 216

As with MySQL, Postgres comes with a specific tool for dumping the DB schemas - pg_dump. Use that to dump the before schema:

$ pg_dump postgresql://nova:password@localhost/nova_before \
    --schema-only > nova_before.sql

Once we have a database instance using the before schema, we can apply the code change and repeat the steps above, this time creating a nova_after database. For example:

$ sudo -u postgres dropdb --if-exists nova_after
$ sudo -u postgres createdb --owner=nova nova_after
$ python manage.py version_control \
    --database 'postgresql://nova:password@localhost/nova_after' \
    --version 215
$ python manage.py upgrade \
    --database 'postgresql://nova:password@localhost/nova_after' \
    --version 216
$ pg_dump postgresql://nova:password@localhost/nova_after \
    --schema-only > nova_after.sql

Finally, diff the two schemas to identify any differences:

$ diff nova_before.sql nova_after.sql

Example

The below script can be used to diff the database. It should be placed in the root directory of the nova repo and executed from there. It relies on the py36 tox virtualenv, which you probably have on your local system already (if not, tox -e py36 will create it). It also uses a single database for each backend, rather than the separate nova_before and nova_after databases.

comments powered by Disqus