Transactions Are Not Locks

Posted on

One thing I wish I had understood better earlier on in my experience with PostgreSQL is how transactions and locks can be used together to provide serializable logic.

An easy way to illustrate this is with a simple bank account system. Suppose we create an accounts table and populate it like this:

create table accounts (
  name text primary key,
  balance int not null
);
insert into accounts (name, balance) values ('A', 10), ('B', 0);

Now we have two bank accounts, A with a balance of $10, and B with a balance of $0.

In order to be a useful bank, we want to be able to move money from one account to another. In pseudocode, the way to move money from one account to another might look something like:

function moveMoney(from, to, amount):
  # Start a transaction.
  txn = db.begin()
  # Update the balances.
  txn.execute('update accounts set balance = balance - $amount where name = $from')
  txn.execute('update accounts set balance = balance + $amount where name = $to')
  # Commit the transaction.
  txn.commit()

We use a transaction here to make sure that either both updates succeed, or both updates fail. In other words, we want to avoid the situation where money is deducted from A but never deposited to B.

There’s another situation that we might want to avoid in our bank too: we might want a rule that account balances can never be negative. To enforce this rule, we can update our moveMoney function:

function moveMoney(from, to, amount):
  # Moving a negative amount of money from A to B is equivalent to moving the
  # corresponding positive amount from B to A.
  if amount < 0:
    moveMoney(to, from, -1*amount)
    return

  # Start a transaction so that all of our queries/updates succeed or fail as a
  # unit.
  txn = db.begin()

  # Make sure the $from account has a balance of at least $amount.
  currBalance = txn.query('select balance from accounts where name = $from')
  if currBalance < amount:
    txn.rollback()
    throw exception

  # Move the money as before.
  txn.execute('update accounts set balance = balance - $amount where name = $from')
  txn.execute('update accounts set balance = balance + $amount where name = $to')

  # Commit the transaction.
  txn.commit()

But there’s a problem with this! Using a transaction only ensures that all of the writes succeed or fail together, it does not provide any guarantees that all of the statements in the transaction execute “at the same time” (i.e. the transactions are not serializable).

Preventing concurrency bugs

Let’s simulate two different actors calling moveMoney('A', 'B', 10) concurrently, again with A having an initial balance of $10 and B having $0:

Actor 1 Actor 2
begin
select balance from accounts where name = 'A'
begin
select balance from accounts where name = 'A'
update accounts set balance = balance - 10 where name = 'A'
update accounts set balance = balance + 10 where name = 'B'
commit
update accounts set balance = balance - 10 where name = 'A'
update accounts set balance = balance + 10 where name = 'B'
commit

Now, if we check the account balances, we can see a problem:

postgres=# select * from accounts ;
 name | balance
------+---------
 A    |     -10
 B    |      20

Both actors read the initial balance as $10, and therefore allowed the operations to proceed. The transaction is ensuring that $10 is deducted from A if and only if $10 is deposited into B, but two transactions can still be reading and making decisions based on the same data concurrently.

(PostgreSQL by default does not allow two transactions to write the same data concurrently; after Actor 1 updates A’s balance, Actor 2 isn’t able to update A’s balance until after the first transaction is committed or rolled back.)

check constraints

There are a few ways we can fix this. One way would be to add a check constraint:

alter table accounts add constraint nonnegative_balance check (balance >= 0);

With this constraint, Actor 2’s update will fail because the constraint would be violated. In fact, we would no longer even need to check the previous balance in our application code at all, because the database itself would ensure no account’s balance ever goes below zero.

Table locks

Another approach would be to use a lock. Before we start reading or writing data from the accounts table, we can use a lock to ensure that our transaction has exclusive access to that table until we roll back or commit:

 begin;
+lock table accounts;
 select balance from accounts where name = 'A';
 update accounts set balance = balance - 10 where name = 'A';
 update accounts set balance = balance + 10 where name = 'B';
 commit;

The lock table accounts statement will not finish until no other transactions have any locks on the accounts table, and will prevent all other transactions from accessing the accounts table until our transaction is committed or rolled back.

Row locks

Locking the entire accounts table is an effective way to prevent overdrawing an account, but it also needlessly slows down our banking program. If someone is trying to move money from A to B while someone else is trying to move money from B to C, the second person’s transaction won’t be able to start until the first transaction completes, even though they’re touching different accounts.

Luckily, rather than acquiring a lock on the entire table, we can just acquire a lock on the row that we’re deducting money from. To do this, we can use for update at the end of our select statement:

select balance from accounts where name = 'A' for update;

Now, other transactions won’t be able to read this row until our transaction is committed or rolled back (for update can only be used inside a transaction).

Transaction isolation levels

One other way to ensure that we don’t overdraw an account is to change the isolation level of the transaction:

begin transaction isolation level serializable;
select balance from accounts where name = 'A';
update accounts set balance = balance - 10 where name = 'A';
update accounts set balance = balance + 10 where name = 'B';
commit;

The PostgreSQL manual has a good description of serializable:

If a pattern of reads and writes among concurrent serializable transactions would create a situation which could not have occurred for any serial (one-at-a-time) execution of those transactions, one of them will be rolled back with a serialization_failure error.

Where a row or table lock would prevent a second transaction from reading the balance until the previous transaction committed, with isolation level serializable the second transaction would immediately fail with an error message: “could not serialize access due to concurrent update.”

There’s a good explanation of the “serializable” consistency model—and how it differs from other models—on the Jepsen site.


More in SQL: