Hands on with PostgreSQL transactions

Hands on with PostgreSQL transactions

- 6 mins

Life would have been a lot easier without concurrency or parallelism.

Everything working sequentially. Cars moving one by one on a single lane road, no overtaking, no accidents. Sounds simple and straightforward.

But it is a tradeoff, because:

This is not feasible at present with current requirement where target is to increase the number of cars passing in/out (high throughput) and decrease travel time (low latency). This can be easily solved with the help of lanes.

Similarly in technical terms, let’s say your API takes 1s, then in 24hrs, you can only serve 86,400 (24 * 60 * 60) requests. Thus serving in sequential fasion is not an option when you are building for current generation, with 640,000 daily active internet users.

So you need to handle requests concurrently/parallelly. At the server level, this can be easily achieved with the help of threads (or equivalent). But how will you handle this at the database layer?

This boils down to a simple money transaction problem. Which is, two API calls are initiated simultaneously, to transfer 50 bucks from A to B. Both the calls are running parallelly in different threads, both read the balance of A as 60 bucks and validates the checks and performs transaction. Eventually leading to undesired output.

One simple way to solve this is to queue requests and execute one by one. This will drastically impact your throughput. On a second thought, why you want to block another person from transferring money when A and B are transacting, at this same point C can transfer to D or vice versa without causing any problem.

So you want to execute requests concurrently/parallelly but those requests should not belong to the same person.

Here comes the database transactions in the picture.

Note: The scope of this blog is only limited to PostgreSQL transactions.

Postgresql transactions

Basically, there can be 4 situations: dirty read, nonrepeatable read, phantom read and serialization anomaly, which can lead to data corruption:

And to handle all these 4 scenarios, there can be four types of isolation level.

Thanks to PostgreSQL, that it automatically takes care of Dirty Read, i.e you won’t be able to read any uncommitted change by another concurrent transaction.

Let’s go through each isolation level one by one.

Note: In this whole blog, I will be taking example with the help of relation whose schema looks like this:

Serializable transaction

This is one of the simplest isolation level. Exactly similar to what we have discussed above, cars running in a single lane.

Or in other words, one transaction is allowed at a time.

As you can see, the later (right side) transaction is waiting for the first (left side) transaction to either commit or rollback.

ERROR: could not serialize access due to concurrent update

In the above two cases, we were trying to update the same tuple. Now let’s try to update different tuples in different Serializable transactions.

Although we were able to update different tuples in two different transactions but were not able to commit both the transactions.

Conclusion: as the name implies, only one transaction at a time. The parallel transaction will not be able to commit.

Read committed transaction

Read Committed is a default isolation level in PostgreSQL. This gives assurance that you will never read any uncommitted change from another transaction.

Let’s take an example

In both the above cases, changes made in a transaction are only visible within the transaction until they are committed. Thus no chance of dirty read.

Now let’s try to update the same tuple in two different transactions.

Have you observed that all the later updates are waiting for the first one to complete, either commit or rollback.

The second transaction failed with an error:

ERROR: could not serialize access due to concurrent update

but the updated value id 3033 that means query without any transaction got excecuted and had overridden the first transaction update.

In such cases using delta update will help, just tweak your query to:

update test set count = count + 3033 where id = 4;

So that, changes are not overridden.

After rolling back the first transaction, the next one i.e second one got executed. Still, the 3rd one is pending. As it is without any transaction, after completion (either commit or rollback) of the second transaction it will anyhow execute.

Note: Here quires are executed in first-in, first-out (FIFO) fashion

Not let’s try to update different tuples in different transactions.

Oh, all updates are successful.

Conclusion:

Repeatable read transaction

It is more strict then Read committed. In addition to all that Read committed offers, Repeatable read also offers you guarantee what, whatever you have read will not change for that transaction.

As you can see in the above example, the output of select quires are constant even after data is changed.

But when you try to update on outdated data, then it throws an error

ERROR: could not serialize access due to concurrent update

Conclusion:


This was all about handling transactions in postgresql. Thanks for reading. Hoping to write more blogs on Hands on series.

Note: In this blog concurrency and parallelism are used interchangeably, though both are different. The aim is just to differentiate it from sequential.

Huge thanks to @ronakjc for pairing with me on this!

Vishwesh Jainkuniya

Vishwesh Jainkuniya

Learner

comments powered by Disqus