Serializable transactions with Django and PostgreSQL

October 11, 2024 / Swen Kooij

In this blog post we explore a real-world race condition and how we can elegantly use SERIALIZABLE transactions with Django and PostgreSQL to avoid this issue.

PostgreSQL's transaction isolation level SERIALIZABLE can provide much stronger guarantees than the default READ COMITTED level. For certain types of applications, SERIALIZABLE transactions can provide a much cleaner and more performant solution than some of the alternatives (exclusive locks, advisory locks etc).

Scenario

A lot of applications have a need to do conditional updates/inserts. Most of them follow a pattern like this:

Select some data
Make some decisions based on the data
Insert/update some rows

In-between selecting the data and performing the actual insert/update the data might have changed. It might have changed in such a way that the course of action would be different. A standard READ COMMITTED transaction doesn't help prevent a race condition in that case. Consider the following example:

def balance(user_id: int) -> int:
    return Ledger.objects.filter(user_id=1).aggregate(balance=Sum("amount"))["balance"]

def withdraw(user_id: int, amount: int) -> None:
    with transaction.atomic():
        if balance(user_id) >= amount:
            Ledger.objects.create(user_id=1, amount=-amount)
        else:
            raise ValueError("Insufficient balance")

This can easily lead to the user spending money it does not have (overdraft). Two or more transactions can run concurrently and both determine the user has sufficient balance and will insert a row to record the withdrawal. The transaction with the default READ COMMITTED isolation level is not helping prevent overdraft here.

Aside from explicit locking, a SERIALIZABLE transaction can help prevent overdraft here. It would guarantee that no two transactions affecting the same user_id run concurrently. If this is about to happen, PostgreSQL will abort one of them and commit the other. When one of the transactions is aborted, the application can retry it. Upon retrying, the application would notice that the user does not have sufficient balance.

Creating a serializable transaction

Solution 1: Change the default isolation level for a connection

There are plenty of articles describing how to create serializable transactions with Django. They all instruct you to create a dedicated database connection for which the default transaction isolation level is SERIALIZABLE. For example, the most popular Stackoverflow answer suggests to do this:

import psycopg2.extensions

DATABASES = {
    # ...
    'OPTIONS': {
        'isolation_level': psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE,
    },
}

Unless you want your entire application to use serializable transactions, the only other option would seem to be to create a dedicated connection with the default transaction isolation level set to SERIALIZABLE and use that when you need a serializable transaction.

Connections don't come for cheap in PostgreSQL. Even with a connection pooler such as PGBouncer configured to keep a small pool of connections dedicated to serializable transactions this can be costly. Unless you're doing a significant amount of serializable transactions you would end up with a few connections that are idle most of the time. A waste of precious connection slots.

Solution 2: Change isolation level for a single transaction

The alternative that nobody seems to talk about is to change the isolation level at the start of your transaction, affecting just that transaction. This can be done irregardless of what the default transaction isolation level is for your connection and is guaranteed to only affect your transaction.

The only catch is that the transaction must be the top-level transaction. Nested transaction.atomic blocks in Django create savepoints, not real transactions. It is not possible to change the isolation level of a transaction after a query has been executed in the transaction. We can use the durable option on transaction.atomic to raise a RuntimeError if the transaction.atomic block is nested in another transaction.atomic block.

If we put this all together, with just a few lines we can create a SERIALIZABLE transaction without changing the way we connect to the database:

from django.db import transaction, connection

with transaction.atomic(durable=True):
    with connection.cursor() as cursor:
        cursor.execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;")

    # do work here over the same connection

Catching serialization failures

If two transactions conflict, one will be aborted and we'll have to handle it. Your application should expect serialization failures and handle them gracefully. Either by retrying the transaction or just abandoning it. What to do here is highly dependent on your use case. For a financial transaction, it might be safer to confirm the retry with the initiating user first.

Django raises a OperationalError when a serialization failure occurs. Unfortunately, OperationalError is also raised for all sorts of other errors. To catch a serialization issue specifically we have to look at the cause of the exception. Django wraps errors raised by the underlying client library and sets them as the cause of the exception. Django uses psycopg2 or psycopg3 as the underlying client library, which raises psycopg2.errors.SerializationFailure or psycopg.errors.SerializationFailure.

We can use this to determine if we're dealing with a serialization error:

import psycopg2

from django.db import OperationalError

try:
    # <serializable transaction here>
except OperationalError as e:
    if isinstance(e, psycopg2.errors.SerializationFailure):
        # retry the transaction
        return

    raise e # not a serialization error, re-raise

Confirming serializable transactions work

Given the example we started with, we can write a small test that confirms that without a SERIALIZABLE transaction we would overdraft if two transactions commit at the same time:

def test_ledger_race_condition():
    Ledger.objects.create(user_id=1, amount=500)

    # Use an event to control the order in which the transactions
    # will commit and make sure they commit at the same time.
    waiter = threading.Event()

    def _concurrent_transaction():
        with transaction.atomic():
            withdraw(user_id=1, amount=500)
            waiter.set()

    # Start a transaction in a separate thread
    thread = threading.Thread(target=_concurrent_transaction)
    thread.start()

    # Start another transaction and signal the threaded
    # one to commit at the same time.
    with transaction.atomic():
        withdraw(user_id=1, amount=500)
        waiter.wait()

    # Wait for the concurrent transaction to finish/commit
    thread.join()

    # Confirm that we withdrew more money than the user had
    assert balance(user_id=1) == -500

Now that we've confirmed that our ledger functions are vulnerable to a race condition that causes overdraft, we can write another test to confirm that adding SERIALIZABLE transactions avoids the race condition:

def test_ledger_concurrency_safe():
    Ledger.objects.create(user_id=1, amount=500)

    # Use an even to control the order in which the transactions
    # will commit and make sure they commit at the same time.
    waiter = threading.Event()

    def _concurrent_transaction():
        with transaction.atomic():
            with connection.cursor() as cursor:
                cursor.execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE")

            withdraw(user_id=1, amount=500)
            waiter.set()

    # Start a transaction in a separate thread
    thread = threading.Thread(target=_concurrent_transaction)
    thread.start()

    # Start another transaction and signal the threaded
    # one to commit at the same time.
    with TestCase.assertRaises(None, OperationalError) as e:
        with transaction.atomic():
            with connection.cursor() as cursor:
                cursor.execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE")

            withdraw(user_id=1, amount=500)
            waiter.wait()

    # Assert that the transaction was aborted due to a serialization failure
    assert isinstance(e.exception.__cause__, psycopg2.errors.SerializationFailure)

    # Wait for the concurrent transaction to finish/commit
    thread.join()

    # Confirm there was no overdraft
    assert balance(user_id=1) == 0

One of the transactions would be aborted by PostgreSQL with a serialization failure error. The final balance is correct and we did not overdraft.

Performance impact

One of the first sentences in the PostgreSQL documentation on serializable transactions might lead you to believe that serializable transactions will be terrible for performance:

This level emulates serial transaction execution for all committed transactions; as if transactions had been executed one after another, serially, rather than concurrently.

There are few factors to take into account when interpreting this statement:

Serializable transactions obtain predicate locks (SIReadLock). Other serializable transactions do NOT WAIT for these to be released. Instead they're aborted. We can't end up in a situation where two serializable transactions are waiting for each other to complete. Throughput remains the same.
Predicate locks can work with your indexes. This means that only a subset of the rows is locked. Other serializable transactions touching other rows would be unaffected and can still execute concurrently.

In our example, assuming user_id is indexed by a unique index, the predicate lock would only cover rows for the same user. Serializable transactions for other users can still safely execute concurrently.

Debugging & confirming predicate locks

It is important that you confirm that the predicate locks your application ends up acquiring cover only a subset of the rows. If the SELECT queries in your transaction select more rows than needed or don't hit any indexes it might end up locking more rows than you expected. As a result, more transactions than needed would end up being aborted and throughput suffers.

You can confirm what locks your transaction is acquiring by inspecting the pg_locks view. A query such as this can help confirm your transaction is obtaining the smallest possible predicate lock:

SELECT
  relation::regclass,
  mode,
  locktype,
  page,
  tuple
FROM
  pg_locks
WHERE
  relation = "mytable"::regclass
  AND mode = 'SIReadLock';

Ideally you see a SIRReadLock with locktype=page or locktype=tuple. This indicates that a single page or tuple was locked instead of the entire table.

Not seeing page/tuple locks

If you only see locktype=relation in your tests even though you have proper indexes set up you might want to try disabling sequential scans during your tests (SET enable_seqscan = OFF;). This is likely to happen when you have a very small table and PostgreSQL's query planner determined that a sequential scan was faster/better than utilizing your index. This also affects the type of SIReadLock PostgreSQL ends up acquiring.

Literature

10 Likes

categories / Tech
tags / postgres, django, transactions, serializable