Serializable transactions with Django and PostgreSQL
In this blog post we explore a real-world race condition and how we can elegantly use SERIALIZABLE
transactions with Django and PostgreSQL to avoid this issue.
PostgreSQL's transaction isolation level SERIALIZABLE
can provide much stronger guarantees than the default READ COMITTED
level. For certain types of applications, SERIALIZABLE
transactions can provide a much cleaner and more performant solution than some of the alternatives (exclusive locks, advisory locks etc).
Scenario
A lot of applications have a need to do conditional updates/inserts. Most of them follow a pattern like this:
- Select some data
- Make some decisions based on the data
- Insert/update some rows
In-between selecting the data and performing the actual insert/update the data might have changed. It might have changed in such a way that the course of action would be different. A standard READ COMMITTED
transaction doesn't help prevent a race condition in that case. Consider the following example:
def balance(user_id: int) -> int:
return Ledger.objects.filter(user_id=1).aggregate(balance=Sum("amount"))["balance"]
def withdraw(user_id: int, amount: int) -> None:
with transaction.atomic():
if balance(user_id) >= amount:
Ledger.objects.create(user_id=1, amount=-amount)
else:
raise ValueError("Insufficient balance")
This can easily lead to the user spending money it does not have (overdraft). Two or more transactions can run concurrently and both determine the user has sufficient balance and will insert a row to record the withdrawal. The transaction with the default READ COMMITTED
isolation level is not helping prevent overdraft here.
Aside from explicit locking, a SERIALIZABLE
transaction can help prevent overdraft here. It would guarantee that no two transactions affecting the same user_id
run concurrently. If this is about to happen, PostgreSQL will abort one of them and commit the other. When one of the transactions is aborted, the application can retry it. Upon retrying, the application would notice that the user does not have sufficient balance.
Creating a serializable transaction
Solution 1: Change the default isolation level for a connection
There are plenty of articles describing how to create serializable transactions with Django. They all instruct you to create a dedicated database connection for which the default transaction isolation level is SERIALIZABLE
. For example, the most popular Stackoverflow answer suggests to do this:
import psycopg2.extensions
DATABASES = {
# ...
'OPTIONS': {
'isolation_level': psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE,
},
}
Unless you want your entire application to use serializable transactions, the only other option would seem to be to create a dedicated connection with the default transaction isolation level set to SERIALIZABLE
and use that when you need a serializable transaction.
Connections don't come for cheap in PostgreSQL. Even with a connection pooler such as PGBouncer configured to keep a small pool of connections dedicated to serializable transactions this can be costly. Unless you're doing a significant amount of serializable transactions you would end up with a few connections that are idle most of the time. A waste of precious connection slots.
Solution 2: Change isolation level for a single transaction
The alternative that nobody seems to talk about is to change the isolation level at the start of your transaction, affecting just that transaction. This can be done irregardless of what the default transaction isolation level is for your connection and is guaranteed to only affect your transaction.
The only catch is that the transaction must be the top-level transaction. Nested transaction.atomic
blocks in Django create savepoints, not real transactions. It is not possible to change the isolation level of a transaction after a query has been executed in the transaction. We can use the durable
option on transaction.atomic
to raise a RuntimeError
if the transaction.atomic
block is nested in another transaction.atomic
block.
If we put this all together, with just a few lines we can create a SERIALIZABLE
transaction without changing the way we connect to the database:
from django.db import transaction, connection
with transaction.atomic(durable=True):
with connection.cursor() as cursor:
cursor.execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;")
# do work here over the same connection
Catching serialization failures
If two transactions conflict, one will be aborted and we'll have to handle it. Your application should expect serialization failures and handle them gracefully. Either by retrying the transaction or just abandoning it. What to do here is highly dependent on your use case. For a financial transaction, it might be safer to confirm the retry with the initiating user first.
Django raises a OperationalError
when a serialization failure occurs. Unfortunately, OperationalError
is also raised for all sorts of other errors. To catch a serialization issue specifically we have to look at the cause of the exception. Django wraps errors raised by the underlying client library and sets them as the cause of the exception. Django uses psycopg2 or psycopg3 as the underlying client library, which raises psycopg2.errors.SerializationFailure
or psycopg.errors.SerializationFailure
.
We can use this to determine if we're dealing with a serialization error:
import psycopg2
from django.db import OperationalError
try:
# <serializable transaction here>
except OperationalError as e:
if isinstance(e, psycopg2.errors.SerializationFailure):
# retry the transaction
return
raise e # not a serialization error, re-raise
Confirming serializable transactions work
Given the example we started with, we can write a small test that confirms that without a SERIALIZABLE
transaction we would overdraft if two transactions commit at the same time:
def test_ledger_race_condition():
Ledger.objects.create(user_id=1, amount=500)
# Use an event to control the order in which the transactions
# will commit and make sure they commit at the same time.
waiter = threading.Event()
def _concurrent_transaction():
with transaction.atomic():
withdraw(user_id=1, amount=500)
waiter.set()
# Start a transaction in a separate thread
thread = threading.Thread(target=_concurrent_transaction)
thread.start()
# Start another transaction and signal the threaded
# one to commit at the same time.
with transaction.atomic():
withdraw(user_id=1, amount=500)
waiter.wait()
# Wait for the concurrent transaction to finish/commit
thread.join()
# Confirm that we withdrew more money than the user had
assert balance(user_id=1) == -500
Now that we've confirmed that our ledger functions are vulnerable to a race condition that causes overdraft, we can write another test to confirm that adding SERIALIZABLE
transactions avoids the race condition:
def test_ledger_concurrency_safe():
Ledger.objects.create(user_id=1, amount=500)
# Use an even to control the order in which the transactions
# will commit and make sure they commit at the same time.
waiter = threading.Event()
def _concurrent_transaction():
with transaction.atomic():
with connection.cursor() as cursor:
cursor.execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE")
withdraw(user_id=1, amount=500)
waiter.set()
# Start a transaction in a separate thread
thread = threading.Thread(target=_concurrent_transaction)
thread.start()
# Start another transaction and signal the threaded
# one to commit at the same time.
with TestCase.assertRaises(None, OperationalError) as e:
with transaction.atomic():
with connection.cursor() as cursor:
cursor.execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE")
withdraw(user_id=1, amount=500)
waiter.wait()
# Assert that the transaction was aborted due to a serialization failure
assert isinstance(e.exception.__cause__, psycopg2.errors.SerializationFailure)
# Wait for the concurrent transaction to finish/commit
thread.join()
# Confirm there was no overdraft
assert balance(user_id=1) == 0
One of the transactions would be aborted by PostgreSQL with a serialization failure error. The final balance is correct and we did not overdraft.
Performance impact
One of the first sentences in the PostgreSQL documentation on serializable transactions might lead you to believe that serializable transactions will be terrible for performance:
This level emulates serial transaction execution for all committed transactions; as if transactions had been executed one after another, serially, rather than concurrently.
There are few factors to take into account when interpreting this statement:
Serializable transactions obtain predicate locks (
SIReadLock
). Other serializable transactions do NOT WAIT for these to be released. Instead they're aborted. We can't end up in a situation where two serializable transactions are waiting for each other to complete. Throughput remains the same.Predicate locks can work with your indexes. This means that only a subset of the rows is locked. Other serializable transactions touching other rows would be unaffected and can still execute concurrently.
In our example, assuming
user_id
is indexed by a unique index, the predicate lock would only cover rows for the same user. Serializable transactions for other users can still safely execute concurrently.
Debugging & confirming predicate locks
It is important that you confirm that the predicate locks your application ends up acquiring cover only a subset of the rows. If the SELECT
queries in your transaction select more rows than needed or don't hit any indexes it might end up locking more rows than you expected. As a result, more transactions than needed would end up being aborted and throughput suffers.
You can confirm what locks your transaction is acquiring by inspecting the pg_locks
view. A query such as this can help confirm your transaction is obtaining the smallest possible predicate lock:
SELECT
relation::regclass,
mode,
locktype,
page,
tuple
FROM
pg_locks
WHERE
relation = "mytable"::regclass
AND mode = 'SIReadLock';
Ideally you see a SIRReadLock
with locktype=page
or locktype=tuple
. This indicates that a single page or tuple was locked instead of the entire table.
Not seeing page/tuple locks
If you only see locktype=relation
in your tests even though you have proper indexes set up you might want to try disabling sequential scans during your tests (SET enable_seqscan = OFF;
). This is likely to happen when you have a very small table and PostgreSQL's query planner determined that a sequential scan was faster/better than utilizing your index. This also affects the type of SIReadLock
PostgreSQL ends up acquiring.