Transactions and Isolation levels - Garren's public notes

All SQL databases and some NoSQL databases support transactions. Transactions allow grouping multiple operations so that they either all pass or all fail. This helps keep the data in a more consistent state. For example, if a banking application needs to transfer money from one account to another. This would be two SQL operations. One to decrease the amount of money in a bank account and another to increase the money amount in the other bank account. If these are not run in a transaction and one operation passes and the other fails, then the data is inconsistent. One account would have the incorrect amount of money in the account. Wrapping the operations in a transaction would then make it so that if one operation fails then both accounts would be reset to what the values were before the transaction started. Keeping it in a consistent state. Where transactions get confusing and used incorrectly is setting the correct isolation levels for a transaction. An isolation level defines how _isolated_ a transaction will be from other transactions that are running at the same time. Higher isolation levels will incur a performance penalty because the database has to work hard to keep transactions isolated. The standard isolation levels are: ### Read uncommitted No isolation at all, a transaction can read updates and changes made from any other transaction before that transaction has been completed. This is the worst isolation and should not be used. It will lead to dirty reads and dirty writes. ### Read Committed Read committed is the next level and means that a transaction will only read data that has been committed. This is [Postgresql default isolation level](https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED). This will provide better isolation than no transaction at all. However, this can cause complications, specifically _non-repeatable reads_. Let's look at a very simplified example to demonstrate some unexpected behaviour. We have two transactions running at the same time. **T1** starts with: ```sql BEGIN select age from Users where name = "Eddie" // outputs 30 ``` Then **T2** runs to completion ```sql BEGIN update Users set age=35 where name = "Eddie" COMMIT ``` **T1** continues ```sql select age from Users where name = "Eddie" // outputs 35 COMMIT ``` **T1** runs its first select statement, then **T2** completes its update and commits before **T1** runs its second select operation. Because **T2** has completed, the select result for **T1's** second select will have the newer `age` value. In simple select cases like the above, this would most likely not be a big issue. But for many queries and systems where you need a consistent and predictable value throughout a transaction, this will cause unexpected issues. The other problem is that when developing and working on an application on your local machine, this will most likely not cause an issue and will only be exposed once the application is in production and under a higher load. ### Repeatable Reads Repeatable reads are where we start getting into safer territory. This is the [default isolation level for MySql](https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html#isolevel_repeatable-read). Repeatable Reads normally work by establishing a set snapshot of the database on the first read. That snapshot is then used throughout the life of the transaction. Meaning that every query will always yield the same results. However because **Repeatable Reads** is stricter, it will throw transaction conflict errors if the database detects that values read or updated in the current transaction have been changed by a committed transaction after the current transaction has started. So each application has to handle those errors. In most cases, the solution is to retry the transaction. ### Serializable **Serializable** is the most strict isolation and the database acts as if each transaction is run one at a time. MySql does this by upgrading all `SELECT` queries to a `SELECT for SHARE` which adds row-level share locks for each query. Postgres adds in extra monitoring to make sure each transaction does not have any behaviour that is inconsistent with all current transactions running one at a time. ### Read and Write anomalies Using an incorrect isolation will lead to read and write anomalies. The book Designing data-intensive applications goes into great detail about it and is a reason [[Why I own two copies of Designing Data-Intensive Applications]].