Hi, we are in a ZMQ set up in a push/pull pattern, and we currently start to have more traffic and cases that the service is unavailable or stuck. We want to: * Not loose messages in services outages * Safely restart service without losing messages (ZeroMQ seems to need to close the socket in the receiver before restart manually)
Do you have experience with this setup with ZeroMQ? Would you suggest RabbitMQ or Amazon SQS (we are in AWS setup) instead? Something else?
Thank you for your time
Both would do the trick, but there are some nuances. We work with both.
From the sound of it, your main focus is "not losing messages". In that case, I would go with RabbitMQ with a high availability policy (ha-mode=all) and a main/retry/error queue pattern.
Push messages to an exchange, which sends them to the main queue. If an error occurs, push the errored out message to the retry exchange, which forwards it to the retry queue. Give the retry queue a x-message-ttl and set the main exchange as a dead-letter-exchange. If your message has been retried several times, push it to the error exchange, where the message can remain until someone has time to look at it.
This is a very useful and resilient pattern that allows you to never lose messages. With the high availability policy, you make sure that if one of your rabbitmq nodes dies, another can take over and messages are already mirrored to it.
This is not really possible with SQS, because SQS is a lot more focused on throughput and scaling. Combined with SNS it can do interesting things like deduplication of messages and such. That said, one thing core to its design is that messages have a maximum retention time. The idea is that a message that has stayed in an SQS queue for a while serves no more purpose after a while, so it gets removed - so as to not block up any listener resources for a long time. You can also set up a DLQ here, but these similarly do not hold onto messages forever. Since you seem to depend on messages surviving at all cost, I would suggest that the scaling/throughput benefit of SQS does not outweigh the difference in approach to messages there.
ZeroMQ is fast but you need to build build reliability yourself. There are a number of patterns described in the zeromq guide. I have used RabbitMQ before which gives lot of functionality out of the box, you can probably use the
worker queues example from the tutorial, it can also persists messages in the queue.
I haven't used Amazon SQS before. Another tool you could use is Kafka.
Amazon seems to offer a number of messaging solutions. The simplest is Amazon SQS, yes. Another is Amazon MQ if you wish to have a hosted RabbitMQ or ActiveMQ messaging platform that is compatible with JMS/AMQP. You will want to measure your needs against the constraints of your messaging scenarios. If you have a small team and you're already on AWS, a quick working solution would be to use the ready-to-go solutions from AWS. My guess is that it would cost more to hire (an) engineer(s) to build/maintain your messaging queue than to use the service-at-scale solutions of AWS. If you have a non-global messaging system you could also consider deploying your own small cluster using RabbitMQ as suggested by Shishir. Another excellent solution might be NATS.io which has a strong community, impressive performance, and is backed by CNCF.
I admit there are a lot of options -- if you wish to own and grow your messaging needs in house, then hire a team and start building. ZMQ is flexible but you will need to write the persistency module and adapt the clustering to your needs. Kafka is a resilient and distributed solution, but it requires an operations team to maintain and handle load-balancing. RabbitMQ seems to be the defacto for getting up and running but can eventually encounter clustering and scaling issues. And on the other hand, the giants offer ready-to-go messaging solutions that end up costing you little, but risk to vendor lock you if you're not careful.
I suppose I ultimately vote for taking the AWS solution -- but check the SLA and performance criteria of your service.