on
Using Postgres as a Message Queue
These days, I’ve been given a lot to think about message queues. While their utility is undeniable, the most important architectural question is: which messaging service/backend do you use?
If we don’t get into the managed vs. self-hosted debate, the impulsive answer is “Redis!”. It’s simple, it’s ridiculously fast, it’s mature, and it’s more popular than energy drinks. Other answers are RabbitMQ, SQS, ZeroMQ, your favorite higher-level abstraction library (BullMQ, etc.), and so forth.
While all the answers are good, I think any messaging/queue backend must provide a few characteristics if we’re to rely on it for creating stable systems:
- Exclusive locks: This one isn’t hard to imagine. If we read a message from the messaging backend, it should be visible to only the process that just read it, or it should be removed from the queue and handed over to the first process atomically. We definitely can’t have race conditions (multiple consumers reading the same message), as it’ll complicate the backend a lot.
- Acknowledgements: If a consumer reads (or worse, plucks) a message from the queue and dies in the process, the message will be lost. Therefore, the messaging system must have some mechanism in place for making sure delivery was successful. This is where the use case for Redis fails in my opinion: Redis data structures such as sorted sets are very lucrative but run into this problem.
- Persistence/durability: Servers getting restarted or crashing without warning is just a fact of life.We can’t build a backend in the naive assumption that we’ll never need to restart the server. As such, volatile messaging systems that store everything in system memory will not work. Even at the cost of slowness, I’d take a solution that offers 100% durability.
If you start thinking like this, Redis is the most likely candidate for rejection. Sure, you can tune its performance characteristics but nothing can be done about the acknowledgement part. SQS seems like a good idea but simulating it locally and getting sucked into the AWS ecosystem (SNS, at the very least) are real concerns. RabbitMQ seems like a very nice thing but there are two problems: it’s a push-based system (which isn’t that bad in itself) and the conceptual complexity seems ridiculous (way too many abstractions for the simple job of messaging!).
I’d still choose RabbitMQ over anything else and pay the price for complexity, but my current use case at work is making me lean towards Postgres! Yes, a very nice messaging system with all the characteristics listed above can be designed in Postgres. Let’s see how:
- Storage format: Raw message data can be easily stored as JSON, serialized strings, or even binary formats. All we need a table with columns
message_id
,key
,payload
, andstatus
. - Persistence: It’s a freaking RDBMS so I don’t need to convince anybody about persistence!
- Exclusive access: Multiple consumers can be made to
SELECT * ... LIMIT 1
from the table by using exclusive locks (SELECT ... FOR UPDATE
). - Durability: A single consumer will wrap the entire
SELECT
andUPDATE
cycle in a transaction, ensuring that either the data gets successfully processed or made available for other consumers. - Fast searching: This concern almost made me drop an RDBMS for this use case. After all, once the table gets very large, searching for unprocessed messages will not be solvable via an index (since the
status
column will have only a handful of unique values) and will cause a full table scan. BUT – and this is something I came across yesteday – a partial index can be built to cover only the unprocessed messages, making data access ridiculously fast!
Of course, everything in the end depends on the use case and the details, but I think using such a system will create a highly reliable messaging backend and can handle situations where less than 50/100 messages are being processed per second and a delay of 100ms isn’t life-or-death. Another huge positive outcome is that nobody will have to learn another library or introduce another point of failure in the system.
I’m yet to take a final call on this, and over the next 2-3 months will build and put in production such a messaging system. Whether all goes well or not, time will tell, but I intend to record my experience here.