
UUID Primary Keys make a lot of sense in modern applications, and UUIDs have become a popular way for developers to publicly identify objects in their applications.
Benefits include:
- They obscure the identifier, making it virtually impossible for attackers to guess IDs.
- They allow Foreign Key references to be similarly obscured.
- They allow for horizontal partitioning without key collision or rekeying concerns.
These are all compelling reasons to opt for UUID keys. However, there are also some substantial drawbacks to using UUIDs as primary keys.
Drawbacks include:
- At scale, they cause massive insert performance issues, due to the primary key being a clustered index, and alphabetical ordinality results in btree index resorts.
- The same btree index resorts wind up causing disk fragmentation issues, which will deteriorate overall I/O performance of the database over time.
- There's no quick "sort by id" chronology available, and so the "latest" items will have to be found using timestamps, which are innately slower than numeric ids.
Fortunately, these drawbacks can be avoided by implementing the Pseudo Primary Key (Pseudo-PK) model, as explained below.
How to use Pseudo-PK UUIDs in your database
There are a few tweaks to a UUID-keyed model that will help in avoiding the above drawbacks. I call this the Pseudo-PK paradigm:
- Make the UUID a unique key called "id",, but not the primary key of the table.
- Use an auto-incremented primary key called "pkid"
- Associate foreign keys with the UUID "id" field, not the pkid
This design will evade all of the issues with the UUID as a primary key, while achieving all of the benefits.
The database design will look something like this, in MySQL / Aurora DDL:
CREATE TABLE `site` ( `pkid` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, `id` CHAR(32) NOT NULL, // Other fields UNIQUE KEY `site_pseudo_pk` (`id`) ); CREATE TABLE `page` ( `pkid` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, `id` CHAR(32) NOT NULL, `site_id` CHAR(32) NOT NULL, // Other fields UNIQUE KEY `page_pseudo_pk` (`id`), CONSTRAINT `page_site_id` FOREIGN KEY (`site_id`) REFERENCES `site` (`id`) );
This design achieves all of our expected benefits while avoiding the major concerns.
Lookup of an individual row by UUID will be a little slower than an int, but that's not avoidable in any design that uses a UUID key.
Note regarding char(32): Django's MySQL UUIDField by default strips out hyphens on insert into the database, truncating UUID values to 32 characters. If your preference is to maintain hyphenation, this would be something you would want to override. There are better ways to store a UUID in MySQL (e.g. as a BINARY(16), using UUID_TO_BIN and BIN_TO_UUID to change representations), but that's a discussion extending beyond the scope of this article.
UUIDs Pseudo-PKs in Django
There are two simple steps required to get the above architecture working in Django with Django Rest Framework.
- Update your models
- Update your viewsets
1. Updates required to Django Models
Django doesn't make it easy for us to implement this pattern. It innately obscures the primary key, so as to simplify that and prevent developers from creating poorly designed databases.
So we first have to implement a new base UUIDModel class that will override Django's built-in functionality.
Our abstract UUIDModel will look something like this:
class UUIDModel(models.Model): pkid = models.BigAutoField(primary_key=True, editable=False) id = models.UUIDField(default=uuid.uuid4, editable=False, unique=True) class Meta: abstract = True
Once we have that in place, we can then build out our models for Site and Page:
class Site(UUIDModel): name = models.CharField(max_length=255) class Page(UUIDModel): site = models.ForeignKey(Site, to_field='id', on_delete=models.CASCADE)
Caveat: One pitfall you'll have to avoid is using the pk reference for model lookups, because this will point to pkid rather than id.
So instead of:
page = Page.objects.get(pk=page_id)
You'll now need to do:
page = Page.objects.get(id=page_id)
It's important to note that all ForeignKey functionality will remain unchanged, e.g. the following will all work as expected:
site = Site.objects.first() for page in site_set.all(): print(page.site.name, '-', page.id)
2. Updates required to DRF ModelViewSets
ModelViewSet implementations require one simple tweak to tell them to reference your new id Pseudo-PK field instead of the pkid Primary Key on lookups.
Here's your tweak, implemented in a new base ViewSet for you to extend:
class UUIDModelViewSet(viewsets.ModelViewSet): lookup_field = 'id'
Then, you can simply extend this base class for your views:
class SiteViewSet(BaseModelViewSet): model = Site class PageViewSet(BaseModelViewSet): model = Page
Once this is done, you're all set to begin wiring things up and get to coding!
Optional: updates in APIViews and Routes
To keep with the paradigm of UUID-based URL identifiers and "id" as a naming convention internally, it will be useful to retrieve IDs from your custom API endpoints using a path parameterized with the <id> param, e.g.:
url(r'^api/v1/my_endpoint/(?P<id>.*)$', MyEndpointView..as_view())
And your corresponding view will look like this:
class MyEndpointView(views.APIView): def post(self, request, *args, **kwargs): id = kwargs.get('id') // Handle post def get(self, request, *args, **kwargs): id = kwargs.get('id') // Handle get
And.... that's all you need to know about that.
That's all, folks!
Those are all the required steps to implementing UUID Pseudo-Primary-Keys in Django and DRF.
Best wishes building your fancy new UUID-enabled app!