Transactions and Isolation levels

• By Garren Smith
Database Transactions Isolation levels

All SQL databases and some NoSQL databases support transactions. Transactions allow grouping multiple operations so that they either all pass or all fail. This helps keep the data in a more consistent state.

For example, if a banking application needs to transfer money from one account to another. This would be two SQL operations. One to decrease the amount of money in a bank account and another to increase the money amount in the other bank account. If these are not run in a transaction and one operation passes and the other fails, then the data is inconsistent. One account would have the incorrect amount of money in the account. Wrapping the operations in a transaction would then make it so that if one operation fails then both accounts would be reset to what the values were before the transaction started. Keeping it in a consistent state.

Where transactions get confusing and used incorrectly is setting the correct isolation levels for a transaction. An isolation level defines how isolated a transaction will be from other transactions that are running at the same time. Higher isolation levels will incur a performance penalty because the database has to work hard to keep transactions isolated. The standard isolation levels are:

Read uncommitted

No isolation at all, a transaction can read updates and changes made from any other transaction before that transaction has been completed. This is the worst isolation and should not be used. It will lead to dirty reads and dirty writes.

Read Committed

Read committed is the next level and means that a transaction will only read data that has been committed. This is Postgresql default isolation level. This will provide better isolation than no transaction at all. However, this can cause complications, specifically non-repeatable reads. Let’s look at a very simplified example to demonstrate some unexpected behaviour. We have two transactions running at the same time.

T1 starts with:

BEGIN
select age from Users where name = "Eddie"
// outputs 30

Then T2 runs to completion

BEGIN
update Users set age=35 where name = "Eddie"
COMMIT

T1 continues

select age from Users where name = "Eddie"
// outputs 35
COMMIT

T1 runs its first select statement, then T2 completes its update and commits before T1 runs its second select operation. Because T2 has completed, the select result for T1’s second select will have the newer age value. In simple select cases like the above, this would most likely not be a big issue. But for many queries and systems where you need a consistent and predictable value throughout a transaction, this will cause unexpected issues.

The other problem is that when developing and working on an application on your local machine, this will most likely not cause an issue and will only be exposed once the application is in production and under a higher load.

Repeatable Reads

Repeatable reads are where we start getting into safer territory. This is the default isolation level for MySql. Repeatable Reads normally work by establishing a set snapshot of the database on the first read. That snapshot is then used throughout the life of the transaction. Meaning that every query will always yield the same results.

However because Repeatable Reads is stricter, it will throw transaction conflict errors if the database detects that values read or updated in the current transaction have been changed by a committed transaction after the current transaction has started. So each application has to handle those errors. In most cases, the solution is to retry the transaction.

Serializable

Serializable is the most strict isolation and the database acts as if each transaction is run one at a time. MySql does this by upgrading all SELECT queries to a SELECT for SHARE which adds row-level share locks for each query. Postgres adds in extra monitoring to make sure each transaction does not have any behaviour that is inconsistent with all current transactions running one at a time.

Read and Write anomalies

Using an incorrect isolation will lead to read and write anomalies. The book Designing data-intensive applications goes into great detail about it and is a reason [[Why I own two copies of Designing Data-Intensive Applications]].