When implementing relationships in SQLAlchemy models, you will generally need
to cross-reference one model from another, ideally in a bidirectional manner.
Historically, the mechanism to do this has been the relationship.backref
keyword. However, as noted in the SQLAlchemy
documentation, this keyword should now be considered legacy
and the relationship.back_populates
migration preferred instead. There are a
couple of reasons for this. Quoting from the docs:
The
relationship.backref
keyword should be considered legacy, and use ofrelationship.back_populates
with explicitrelationship()
constructs should be preferred. Using individualrelationship()
constructs provides advantages including that both ORM mapped classes will include their attributes up front as the class is constructed, rather than as a deferred step, and configuration is more straightforward as all arguments are explicit. New PEP 484 features in SQLAlchemy 2.0 also take advantage of attributes being explicitly present in source code rather than using dynamic attribute generation.
The SQLAlchemy docs include a guide on migrating tables, (as well as a great overview on relationships in general), however, this guide assumes you are using foreign key constraints in your models and using the auto-incrementing primary key field as your reference. To be honest, you probably should be using foreign keys since without these you have no way to enforce referential integrity but say you can’t use them or don’t want to, how does one migrate? The below examples should hopefully guide you in migrating these.
One-to-Many
The one-to-many pattern is the simpler one to migrate. Consider the following example:
import uuid
import sqlalchemy as sa
from sqlalchemy import orm
BASE = orm.declarative_base()
class User(BASE):
__tablename__ = 'user'
id = sa.Column(sa.Integer, primary_key=True)
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
class Address(BASE):
__tablename__ = 'addresses'
__tableargs__ = (
sa.Index('addresses_user_uuid_idx', 'user_uuid'),
)
id = sa.Column(sa.Integer, primary_key=True)
user_uuid = sa.Column(sa.String(36), nullable=False)
user = orm.relationship(
'User',
backref='addresses',
primaryjoin='Address.user_uuid == User.uuid',
foreign_keys=user_uuid,
)
def main():
engine = sa.create_engine('sqlite://')
session = orm.sessionmaker(bind=engine)()
BASE.metadata.create_all(engine)
user = User(name='John Doe', uuid=str(uuid.uuid4()))
session.add(user)
session.commit()
address = Address(user_uuid=user.uuid)
session.add(address)
session.commit()
print('# Users')
for user in session.query(User).all():
print(f'User: name={user.name}')
print()
print('# Addresses')
for address in session.query(Address).all():
print(f'Address: user={address.user_uuid}')
print()
if __name__ == '__main__':
main()
Before looking at migrating from backref
to back_populates
, it’s worth
understanding what’s going on here. You’ll note that we have a relationship
from the Address
model back to the User
model. This has the backref
keyword set on it, allowing you to reference related addresses from instances
of User
, however it also has primaryjoin
and foreign_keys
keywords set.
These are necessary because the user_uuid
column does not have a foreign key
constraint present. You can confirm this by inspecting the schemas generated by
SQLAlchemy.
❯ python
>>> import test_o2m # name of module saved locally
>>> from sqlalchemy import schema
>>> print(schema.CreateTable(test_o2m.User.__table__))
CREATE TABLE "user" (
id INTEGER NOT NULL,
uuid VARCHAR(36) NOT NULL,
name VARCHAR,
PRIMARY KEY (id)
)
>>> print(schema.CreateTable(test_o2m.Address.__table__))
CREATE TABLE addresses (
id INTEGER NOT NULL,
user_uuid VARCHAR(36) NOT NULL,
PRIMARY KEY (id)
)
Because we don’t have a foreign key present, SQLAlchemy isn’t able to determine
the join condition. This necessitates the addition of the primaryjoin
and
foreign_keys
keywords to guide SQLAlchemy on resolving these relations.
Removing these keywords will yield an error message.
sqlalchemy.exc.NoForeignKeysError: Can't find any foreign key relationships between 'addresses' and 'user'.
The only other unusual thing here is that we’re using a field other than id
to reference things, namely a uuid
field on both models. There’s no real
reason to do this here but this was the case in the models that I had to
migrate and it makes things a little more complicated and “real-world’y”.
Now that we understand what’s going, it’s time to look at switching from
backref
to back_populates
. It turns out to be pretty simple for one-to-many
models like this.
import uuid
import sqlalchemy as sa
from sqlalchemy import orm
BASE = orm.declarative_base()
class User(BASE):
__tablename__ = 'user'
id = sa.Column(sa.Integer, primary_key=True)
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
addresses = orm.relationship(
'Address',
primaryjoin='User.uuid == Address.user_uuid',
back_populates='user',
foreign_keys='Address.user_uuid',
)
class Address(BASE):
__tablename__ = 'addresses'
__tableargs__ = (
sa.Index('addresses_user_uuid_idx', 'user_uuid'),
)
id = sa.Column(sa.Integer, primary_key=True)
user_uuid = sa.Column(sa.String(36), nullable=False)
user = orm.relationship(
'User',
primaryjoin='Address.user_uuid == User.uuid',
back_populates='addresses',
foreign_keys=user_uuid,
)
def main():
engine = sa.create_engine('sqlite://')
session = orm.sessionmaker(bind=engine)()
BASE.metadata.create_all(engine)
user = User(name='John Doe', uuid=str(uuid.uuid4()))
session.add(user)
session.commit()
address = Address(user_uuid=user.uuid)
session.add(address)
session.commit()
print('# Users')
for user in session.query(User).all():
print(f'User: name={user.name}')
print()
print('# Addresses')
for address in session.query(Address).all():
print(f'Address: user={address.user_uuid}')
print()
if __name__ == '__main__':
main()
Looking at the diff for these two files, we see the following:
diff --git test_o2m.py test_o2m.py
index 4f23221..f865045 100644
--- test_o2m.py
+++ test_o2m.py
@@ -14,6 +14,13 @@ class User(BASE):
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
+ addresses = orm.relationship(
+ 'Address',
+ primaryjoin='User.uuid == Address.user_uuid',
+ back_populates='user',
+ foreign_keys='Address.user_uuid',
+ )
+
class Address(BASE):
__tablename__ = 'addresses'
@@ -26,8 +33,8 @@ class Address(BASE):
user = orm.relationship(
'User',
- backref='addresses',
primaryjoin='Address.user_uuid == User.uuid',
+ back_populates='addresses',
foreign_keys=user_uuid,
)
So for the Address
model, we have simply replaced the backref
keyword with
back_populates
. For the User
model, we have added a new relationship. This
is mostly the inverse of the relationship on the Address
model and only the
foreign_key
argument points to the same field (a foreign field in the
Address
model case).
Overall, this isn’t a difficult migration. Migrating many-to-many models could be a little trickier though. Let’s see.
Many-to-Many
Now consider the following example:
import uuid
import sqlalchemy as sa
from sqlalchemy import orm
BASE = orm.declarative_base()
association_table = sa.Table(
'association',
BASE.metadata,
sa.Column('parent_uuid', sa.String(36), primary_key=True),
sa.Column('child_uuid', sa.String(36), primary_key=True),
)
class Parent(BASE):
__tablename__ = 'parents'
__table_args__ = (
sa.Index('parents_uuid_idx', 'uuid', unique=True),
)
id = sa.Column(sa.Integer, primary_key=True)
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
class Child(BASE):
__tablename__ = 'children'
__table_args__ = (
sa.Index('children_uuid_idx', 'uuid', unique=True),
)
id = sa.Column(sa.Integer, primary_key=True)
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
parents = orm.relationship(
'Parent',
secondary='association',
primaryjoin='Child.uuid == association.c.child_uuid',
secondaryjoin='association.c.parent_uuid == Parent.uuid',
backref='children',
)
def main():
engine = sa.create_engine('sqlite://')
session = orm.sessionmaker(bind=engine)()
BASE.metadata.create_all(engine)
parent = Parent(name='John Doe', uuid=str(uuid.uuid4()))
session.add(parent)
session.commit()
child = Child(name='Jimmy Doe', uuid=str(uuid.uuid4()))
session.add(child)
session.commit()
parent.children.append(child)
print('# Parents')
for parent in session.query(Parent).all():
children = [x.name for x in parent.children]
print(f'Parent: name={parent.name}, children={children}')
print()
print('# Children')
for child in session.query(Child).all():
parents = [x.name for x in child.parents]
print(f'Child: name={child.name}, parents={parents}')
print()
if __name__ == '__main__':
main()
Again, there are a couple of things to consider here. Firstly, since this is a
many-to-many relationship, we need a so-called join table (also known as a
association table in SQLAlchemy or a through table in Django). The
necessitates the use of the secondary
keyword to indicate the association
table. Next up, once again we do not have a foreign key constraint present
meaning we need to inform SQLAlchemy how the join should be managed. The use of
the association table means we must specify not only primaryjoin
but also
secondaryjoin
: the former describes the relationship from Child
to the
association table, while the latter describes the relationship from the
association table to Parent
. Finally, once again we’re using a uuid
field
for referencing to makes things a little more like the real world.
Now to switch from backref
to back_populates
once again.
import uuid
import sqlalchemy as sa
from sqlalchemy import orm
BASE = orm.declarative_base()
association_table = sa.Table(
'association',
BASE.metadata,
sa.Column('parent_uuid', sa.String(36), primary_key=True),
sa.Column('child_uuid', sa.String(36), primary_key=True),
)
class Parent(BASE):
__tablename__ = 'parents'
__table_args__ = (
sa.Index('parents_uuid_idx', 'uuid', unique=True),
)
id = sa.Column(sa.Integer, primary_key=True)
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
children = orm.relationship(
'Child',
secondary='association',
primaryjoin='Parent.uuid == association.c.parent_uuid',
secondaryjoin='association.c.child_uuid == Child.uuid',
back_populates='parents',
)
class Child(BASE):
__tablename__ = 'children'
__table_args__ = (
sa.Index('children_uuid_idx', 'uuid', unique=True),
)
id = sa.Column(sa.Integer, primary_key=True)
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
parents = orm.relationship(
'Parent',
secondary='association',
primaryjoin='Child.uuid == association.c.child_uuid',
secondaryjoin='association.c.parent_uuid == Parent.uuid',
back_populates='children',
)
def main():
engine = sa.create_engine('sqlite://')
session = orm.sessionmaker(bind=engine)()
BASE.metadata.create_all(engine)
parent = Parent(name='John Doe', uuid=str(uuid.uuid4()))
session.add(parent)
session.commit()
child = Child(name='Jimmy Doe', uuid=str(uuid.uuid4()))
session.add(child)
session.commit()
parent.children.append(child)
print('# Parents')
for parent in session.query(Parent).all():
children = [x.name for x in parent.children]
print(f'Parent: name={parent.name}, children={children}')
print()
print('# Children')
for child in session.query(Child).all():
parents = [x.name for x in child.parents]
print(f'Child: name={child.name}, parents={parents}')
print()
if __name__ == '__main__':
main()
Looking at the diff for these two files, we see the following:
diff --git test_m2m.py test_m2m.py
index 948942f..d1dc0d4 100644
--- test_m2m.py
+++ test_m2m.py
@@ -25,6 +25,14 @@ class Parent(BASE):
uuid = sa.Column(sa.String(36), nullable=False)
name = sa.Column(sa.String)
+ children = orm.relationship(
+ 'Child',
+ secondary='association',
+ primaryjoin='Parent.uuid == association.c.parent_uuid',
+ secondaryjoin='association.c.child_uuid == Child.uuid',
+ back_populates='parents',
+ )
+
class Child(BASE):
__tablename__ = 'children'
@@ -41,7 +49,7 @@ class Child(BASE):
secondary='association',
primaryjoin='Child.uuid == association.c.child_uuid',
secondaryjoin='association.c.parent_uuid == Parent.uuid',
- backref='children',
+ back_populates='children',
)
This is nearly identical to the changes required for migrating the one-to-many
models. Once again, we simply replace backref
with back_populates
on the
relationship that is defined. We then define a new relationship
on the
Parent
model which has an inverse of the primary and secondary joins given
for the Child
model relationship.
Summary
So that’s how you migrate from backref
to back_populates
on models without
foreign key constraints present. This likely isn’t broadly useful but hopefully
it will help someone that encounters this.
The code for this blog post can be found on GitHub.