SQLAlchemy Relationships Without Foreign Keys (And How to Replace 'backref' With 'back_populates')

Image by alinnnaaaa / Unsplash

When implementing relationships in SQLAlchemy models, you will generally need to cross-reference one model from another, ideally in a bidirectional manner. Historically, the mechanism to do this has been the relationship.backref keyword. However, as noted in the SQLAlchemy documentation, this keyword should now be considered legacy and the relationship.back_populates migration preferred instead. There are a couple of reasons for this. Quoting from the docs:

The relationship.backref keyword should be considered legacy, and use of relationship.back_populates with explicit relationship() constructs should be preferred. Using individual relationship() constructs provides advantages including that both ORM mapped classes will include their attributes up front as the class is constructed, rather than as a deferred step, and configuration is more straightforward as all arguments are explicit. New PEP 484 features in SQLAlchemy 2.0 also take advantage of attributes being explicitly present in source code rather than using dynamic attribute generation.

The SQLAlchemy docs include a guide on migrating tables, (as well as a great overview on relationships in general), however, this guide assumes you are using foreign key constraints in your models and using the auto-incrementing primary key field as your reference. To be honest, you probably should be using foreign keys since without these you have no way to enforce referential integrity but say you can’t use them or don’t want to, how does one migrate? The below examples should hopefully guide you in migrating these.

One-to-Many

The one-to-many pattern is the simpler one to migrate. Consider the following example:

import uuid

import sqlalchemy as sa
from sqlalchemy import orm


BASE = orm.declarative_base()


class User(BASE):
    __tablename__ = 'user'

    id = sa.Column(sa.Integer, primary_key=True)
    uuid = sa.Column(sa.String(36), nullable=False)
    name = sa.Column(sa.String)


class Address(BASE):
    __tablename__ = 'addresses'
    __tableargs__ = (
        sa.Index('addresses_user_uuid_idx', 'user_uuid'),
    )

    id = sa.Column(sa.Integer, primary_key=True)
    user_uuid = sa.Column(sa.String(36), nullable=False)

    user = orm.relationship(
        'User',
        backref='addresses',
        primaryjoin='Address.user_uuid == User.uuid',
        foreign_keys=user_uuid,
    )


def main():
    engine = sa.create_engine('sqlite://')
    session = orm.sessionmaker(bind=engine)()
    BASE.metadata.create_all(engine)

    user = User(name='John Doe', uuid=str(uuid.uuid4()))

    session.add(user)
    session.commit()

    address = Address(user_uuid=user.uuid)
    session.add(address)
    session.commit()

    print('# Users')
    for user in session.query(User).all():
        print(f'User: name={user.name}')
    print()

    print('# Addresses')
    for address in session.query(Address).all():
        print(f'Address: user={address.user_uuid}')
    print()


if __name__ == '__main__':
    main()

Before looking at migrating from backref to back_populates, it’s worth understanding what’s going on here. You’ll note that we have a relationship from the Address model back to the User model. This has the backref keyword set on it, allowing you to reference related addresses from instances of User, however it also has primaryjoin and foreign_keys keywords set. These are necessary because the user_uuid column does not have a foreign key constraint present. You can confirm this by inspecting the schemas generated by SQLAlchemy.

❯ python
>>> import test_o2m  # name of module saved locally
>>> from sqlalchemy import schema
>>> print(schema.CreateTable(test_o2m.User.__table__))

CREATE TABLE "user" (
        id INTEGER NOT NULL,
        uuid VARCHAR(36) NOT NULL,
        name VARCHAR,
        PRIMARY KEY (id)
)


>>> print(schema.CreateTable(test_o2m.Address.__table__))

CREATE TABLE addresses (
        id INTEGER NOT NULL,
        user_uuid VARCHAR(36) NOT NULL,
        PRIMARY KEY (id)
)

Because we don’t have a foreign key present, SQLAlchemy isn’t able to determine the join condition. This necessitates the addition of the primaryjoin and foreign_keys keywords to guide SQLAlchemy on resolving these relations. Removing these keywords will yield an error message.

sqlalchemy.exc.NoForeignKeysError: Can't find any foreign key relationships between 'addresses' and 'user'.

The only other unusual thing here is that we’re using a field other than id to reference things, namely a uuid field on both models. There’s no real reason to do this here but this was the case in the models that I had to migrate and it makes things a little more complicated and “real-world’y”.

Now that we understand what’s going, it’s time to look at switching from backref to back_populates. It turns out to be pretty simple for one-to-many models like this.

import uuid

import sqlalchemy as sa
from sqlalchemy import orm


BASE = orm.declarative_base()


class User(BASE):
    __tablename__ = 'user'

    id = sa.Column(sa.Integer, primary_key=True)
    uuid = sa.Column(sa.String(36), nullable=False)
    name = sa.Column(sa.String)

    addresses = orm.relationship(
        'Address',
        primaryjoin='User.uuid == Address.user_uuid',
        back_populates='user',
        foreign_keys='Address.user_uuid',
    )


class Address(BASE):
    __tablename__ = 'addresses'
    __tableargs__ = (
        sa.Index('addresses_user_uuid_idx', 'user_uuid'),
    )

    id = sa.Column(sa.Integer, primary_key=True)
    user_uuid = sa.Column(sa.String(36), nullable=False)

    user = orm.relationship(
        'User',
        primaryjoin='Address.user_uuid == User.uuid',
        back_populates='addresses',
        foreign_keys=user_uuid,
    )


def main():
    engine = sa.create_engine('sqlite://')
    session = orm.sessionmaker(bind=engine)()
    BASE.metadata.create_all(engine)

    user = User(name='John Doe', uuid=str(uuid.uuid4()))

    session.add(user)
    session.commit()

    address = Address(user_uuid=user.uuid)
    session.add(address)
    session.commit()

    print('# Users')
    for user in session.query(User).all():
        print(f'User: name={user.name}')
    print()

    print('# Addresses')
    for address in session.query(Address).all():
        print(f'Address: user={address.user_uuid}')
    print()


if __name__ == '__main__':
    main()

Looking at the diff for these two files, we see the following:

diff --git test_o2m.py test_o2m.py
index 4f23221..f865045 100644
--- test_o2m.py
+++ test_o2m.py
@@ -14,6 +14,13 @@ class User(BASE):
     uuid = sa.Column(sa.String(36), nullable=False)
     name = sa.Column(sa.String)

+    addresses = orm.relationship(
+        'Address',
+        primaryjoin='User.uuid == Address.user_uuid',
+        back_populates='user',
+        foreign_keys='Address.user_uuid',
+    )
+

 class Address(BASE):
     __tablename__ = 'addresses'
@@ -26,8 +33,8 @@ class Address(BASE):

     user = orm.relationship(
         'User',
-        backref='addresses',
         primaryjoin='Address.user_uuid == User.uuid',
+        back_populates='addresses',
         foreign_keys=user_uuid,
     )

So for the Address model, we have simply replaced the backref keyword with back_populates. For the User model, we have added a new relationship. This is mostly the inverse of the relationship on the Address model and only the foreign_key argument points to the same field (a foreign field in the Address model case).

Overall, this isn’t a difficult migration. Migrating many-to-many models could be a little trickier though. Let’s see.

Many-to-Many

Now consider the following example:

import uuid

import sqlalchemy as sa
from sqlalchemy import orm


BASE = orm.declarative_base()


association_table = sa.Table(
    'association',
    BASE.metadata,
    sa.Column('parent_uuid', sa.String(36), primary_key=True),
    sa.Column('child_uuid', sa.String(36), primary_key=True),
)


class Parent(BASE):
    __tablename__ = 'parents'
    __table_args__ = (
        sa.Index('parents_uuid_idx', 'uuid', unique=True),
    )

    id = sa.Column(sa.Integer, primary_key=True)
    uuid = sa.Column(sa.String(36), nullable=False)
    name = sa.Column(sa.String)


class Child(BASE):
    __tablename__ = 'children'
    __table_args__ = (
        sa.Index('children_uuid_idx', 'uuid', unique=True),
    )

    id = sa.Column(sa.Integer, primary_key=True)
    uuid = sa.Column(sa.String(36), nullable=False)
    name = sa.Column(sa.String)

    parents = orm.relationship(
        'Parent',
        secondary='association',
        primaryjoin='Child.uuid == association.c.child_uuid',
        secondaryjoin='association.c.parent_uuid == Parent.uuid',
        backref='children',
    )


def main():
    engine = sa.create_engine('sqlite://')
    session = orm.sessionmaker(bind=engine)()
    BASE.metadata.create_all(engine)

    parent = Parent(name='John Doe', uuid=str(uuid.uuid4()))
    session.add(parent)
    session.commit()

    child = Child(name='Jimmy Doe', uuid=str(uuid.uuid4()))
    session.add(child)
    session.commit()

    parent.children.append(child)

    print('# Parents')
    for parent in session.query(Parent).all():
        children = [x.name for x in parent.children]
        print(f'Parent: name={parent.name}, children={children}')
    print()

    print('# Children')
    for child in session.query(Child).all():
        parents = [x.name for x in child.parents]
        print(f'Child: name={child.name}, parents={parents}')
    print()


if __name__ == '__main__':
    main()

Again, there are a couple of things to consider here. Firstly, since this is a many-to-many relationship, we need a so-called join table (also known as a association table in SQLAlchemy or a through table in Django). The necessitates the use of the secondary keyword to indicate the association table. Next up, once again we do not have a foreign key constraint present meaning we need to inform SQLAlchemy how the join should be managed. The use of the association table means we must specify not only primaryjoin but also secondaryjoin: the former describes the relationship from Child to the association table, while the latter describes the relationship from the association table to Parent. Finally, once again we’re using a uuid field for referencing to makes things a little more like the real world.

Now to switch from backref to back_populates once again.

import uuid

import sqlalchemy as sa
from sqlalchemy import orm


BASE = orm.declarative_base()


association_table = sa.Table(
    'association',
    BASE.metadata,
    sa.Column('parent_uuid', sa.String(36), primary_key=True),
    sa.Column('child_uuid', sa.String(36), primary_key=True),
)


class Parent(BASE):
    __tablename__ = 'parents'
    __table_args__ = (
        sa.Index('parents_uuid_idx', 'uuid', unique=True),
    )

    id = sa.Column(sa.Integer, primary_key=True)
    uuid = sa.Column(sa.String(36), nullable=False)
    name = sa.Column(sa.String)

    children = orm.relationship(
        'Child',
        secondary='association',
        primaryjoin='Parent.uuid == association.c.parent_uuid',
        secondaryjoin='association.c.child_uuid == Child.uuid',
        back_populates='parents',
    )


class Child(BASE):
    __tablename__ = 'children'
    __table_args__ = (
        sa.Index('children_uuid_idx', 'uuid', unique=True),
    )

    id = sa.Column(sa.Integer, primary_key=True)
    uuid = sa.Column(sa.String(36), nullable=False)
    name = sa.Column(sa.String)

    parents = orm.relationship(
        'Parent',
        secondary='association',
        primaryjoin='Child.uuid == association.c.child_uuid',
        secondaryjoin='association.c.parent_uuid == Parent.uuid',
        back_populates='children',
    )


def main():
    engine = sa.create_engine('sqlite://')
    session = orm.sessionmaker(bind=engine)()
    BASE.metadata.create_all(engine)

    parent = Parent(name='John Doe', uuid=str(uuid.uuid4()))
    session.add(parent)
    session.commit()

    child = Child(name='Jimmy Doe', uuid=str(uuid.uuid4()))
    session.add(child)
    session.commit()

    parent.children.append(child)

    print('# Parents')
    for parent in session.query(Parent).all():
        children = [x.name for x in parent.children]
        print(f'Parent: name={parent.name}, children={children}')
    print()

    print('# Children')
    for child in session.query(Child).all():
        parents = [x.name for x in child.parents]
        print(f'Child: name={child.name}, parents={parents}')
    print()


if __name__ == '__main__':
    main()

Looking at the diff for these two files, we see the following:

diff --git test_m2m.py test_m2m.py
index 948942f..d1dc0d4 100644
--- test_m2m.py
+++ test_m2m.py
@@ -25,6 +25,14 @@ class Parent(BASE):
     uuid = sa.Column(sa.String(36), nullable=False)
     name = sa.Column(sa.String)

+    children = orm.relationship(
+        'Child',
+        secondary='association',
+        primaryjoin='Parent.uuid == association.c.parent_uuid',
+        secondaryjoin='association.c.child_uuid == Child.uuid',
+        back_populates='parents',
+    )
+

 class Child(BASE):
     __tablename__ = 'children'
@@ -41,7 +49,7 @@ class Child(BASE):
         secondary='association',
         primaryjoin='Child.uuid == association.c.child_uuid',
         secondaryjoin='association.c.parent_uuid == Parent.uuid',
-        backref='children',
+        back_populates='children',
     )

This is nearly identical to the changes required for migrating the one-to-many models. Once again, we simply replace backref with back_populates on the relationship that is defined. We then define a new relationship on the Parent model which has an inverse of the primary and secondary joins given for the Child model relationship.

Summary

So that’s how you migrate from backref to back_populates on models without foreign key constraints present. This likely isn’t broadly useful but hopefully it will help someone that encounters this.

The code for this blog post can be found on GitHub.

comments powered by Disqus