What is Celery and what are its top alternatives?
Top Alternatives to Celery
RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received. ...
Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design. ...
Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. ...
Cucumber is a tool that supports Behaviour-Driven Development (BDD) - a software development process that aims to enhance software quality and reduce maintenance costs. ...
- Amazon SQS
Transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available. With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use. ...
Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License. ...
- Azure Service Bus
It is a cloud messaging system for connecting apps and devices across public and private clouds. You can depend on it when you need highly-reliable cloud messaging service between applications and services, even when one or more is offline. ...
It was designed as an extremely lightweight publish/subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium. ...
Celery alternatives & related posts
- It's fast and it works with good metrics/monitoring234
- Ease of configuration80
- I like the admin interface59
- Easy to set-up and start with50
- Standard protocols18
- Intuitive work through python18
- Written primarily in Erlang10
- Simply superb8
- Completeness of messaging patterns6
- Scales to 1 million messages per second3
- Supports AMQP2
- Better than most traditional queue based message broker2
- Supports MQTT2
- Clear documentation with different scripting language1
- Great ui1
- Inubit Integration1
- Better routing system1
- High performance1
- Runs on Open Telecom Platform1
- Delayed messages1
- Too complicated cluster/HA config and management9
- Needs Erlang runtime. Need ops good with Erlang runtime6
- Configuration must be done first, not by your code5
related RabbitMQ posts
As Sentry runs throughout the day, there are about 50 different offline tasks that we execute—anything from “process this event, pretty please” to “send all of these cool people some emails.” There are some that we execute once a day and some that execute thousands per second.
Managing this variety requires a reliably high-throughput message-passing technology. We use Celery's RabbitMQ implementation, and we stumbled upon a great feature called Federation that allows us to partition our task queue across any number of RabbitMQ servers and gives us the confidence that, if any single server gets backlogged, others will pitch in and distribute some of the backlogged tasks to their consumers.
Hi, I am building an enhanced web-conferencing app that will have a voice/video call, live chats, live notifications, live discussions, screen sharing, etc features. Ref: Zoom.
I need advise finalizing the tech stack for this app. I am considering below tech stack:
- Frontend: React
- Backend: Node.js
- Database: MongoDB
- IAAS: #AWS
- Containers & Orchestration: Docker / Kubernetes
- DevOps: GitLab, Terraform
- Brokers: Redis / RabbitMQ
I need advice at the platform level as to what could be considered to support concurrent video streaming seamlessly.
Also, please suggest what could be a better tech stack for my app?
#SAAS #VideoConferencing #WebAndVideoConferencing #zoom #stack
- Open source18
- Written in Scala and java. Runs on JVM11
- Message broker + Streaming system8
- Avro schema integration4
- Suport Multiple clients3
- Partioned, replayable log2
- Simple publisher / multi-subscriber model1
- Extremely good parallelism constructs1
- Non-Java clients are second-class citizens32
- Needs Zookeeper29
- Operational difficulties9
- Terrible Packaging4
related Kafka posts
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.
We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.
- Task Dependency Management14
- Beautiful UI12
- Cluster of workers12
- Open source6
- Complex workflows5
- Good api3
- Apache project3
- Custom operators3
- Observability is not great when the DAGs exceed 2502
- Running it on kubernetes cluster relatively complex2
- Open source - provides minimum or no support2
- Logical separation of DAGs is not straight forward1
related Airflow posts
I am looking for an open-source scheduler tool with cross-functional application dependencies. Some of the tasks I am looking to schedule are as follows:
- Trigger Matillion ETL loads
- Trigger Attunity Replication tasks that have downstream ETL loads
- Trigger Golden gate Replication Tasks
- Shell scripts, wrappers, file watchers
- Event-driven schedules
I have used Airflow in the past, and I know we need to create DAGs for each pipeline. I am not familiar with Jenkins, but I know it works with configuration without much underlying code. I want to evaluate both and appreciate any advise
I am working on a project that grabs a set of input data from AWS S3, pre-processes and divvies it up, spins up 10K batch containers to process the divvied data in parallel on AWS Batch, post-aggregates the data, and pushes it to S3.
I already have software patterns from other projects for Airflow + Batch but have not dealt with the scaling factors of 10k parallel tasks. Airflow is nice since I can look at which tasks failed and retry a task after debugging. But dealing with that many tasks on one Airflow EC2 instance seems like a barrier. Another option would be to have one task that kicks off the 10k containers and monitors it from there.
I have no experience with AWS Step Functions but have heard it's AWS's Airflow. There looks to be plenty of patterns online for Step Functions + Batch. Do Step Functions seem like a good path to check out for my use case? Do you get the same insights on failing jobs / ability to retry tasks as you do with Airflow?
- Simple Syntax20
- Simple usage8
- Huge community5
- Nice report3
related Cucumber posts
With this structure, we're able to combine the automation efforts of each team member into a centralized repository while also providing new relevant metrics to business owners.
- Easy to use, reliable62
- Low cost40
- Doesn't need to maintain it14
- It is Serverless8
- Has a max message size (currently 256K)4
- Triggers Lambda3
- Easy to configure with Terraform3
- Delayed delivery upto 15 mins only3
- Delayed delivery upto 12 hours3
- JMS compliant1
- Support for retry and dead letter queue1
- Has a max message size (currently 256K)2
- Difficult to configure2
- Has a maximum 15 minutes of delayed messages only1
related Amazon SQS posts
We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.
To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas
To build #Webapps we decided to use Angular 2 with RxJS
#Devops - GitHub , Travis CI , Terraform , Docker , Serverless
In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.
While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.
In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.
#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService
- Easy to use18
- Open source14
- JMS compliant10
- High Availability6
- Distributed Network of brokers3
- Support XA (distributed transactions)3
- Docker delievery1
- Highly configurable1
- ONLY Vertically Scalable1
- Low resilience to exceptions and interruptions1
- Difficult to scale1
related ActiveMQ posts
I want to choose Message Queue with the following features - Highly Available, Distributed, Scalable, Monitoring. I have RabbitMQ, ActiveMQ, Kafka and Apache RocketMQ in mind. But I am confused which one to choose.
I use ActiveMQ because RabbitMQ have stopped giving the support for AMQP 1.0 or above version and the earlier version of AMQP doesn't give the functionality to support OAuth.
If OAuth is not required and we can go with AMQP 0.9 then i still recommend rabbitMq.
- Easy Integration with .Net4
- Cloud Native2
- Use while high messaging need1
- Limited features in Basic tier1
- Skills can only be used in Azure - vendor lock-in1
- Lacking in JMS support1
- Observability of messages in the queue is lacking1
related Azure Service Bus posts
Want to get the differences in features and enhancement, pros and cons, and also how to Migrate from IBM MQ to Azure Service Bus.
- Varying levels of Quality of Service to fit a range of3
- Lightweight with a relatively small data footprint2
- Very easy to configure and use with open source tools2
- Easy to configure in an unsecure manner1
related MQTT posts
Kindly suggest the best tool for generating 10Mn+ concurrent user load. The tool must support MQTT traffic, REST API, support to interfaces such as Kafka, websockets, persistence HTTP connection, auth type support to assess the support /coverage.
The tool can be integrated into CI pipelines like Azure Pipelines, GitHub, and Jenkins.
For the com part, depending of more details not provided, i'd use SSE, OR i'd run either Mosquitto or RabbitMQ running on Amazon EC2 instances and leverage MQTT or amqp 's subscribe/publish features with my users running mqtt or amqp clients (tcp or websockets) somehow. (publisher too.. you don't say how and who gets to update the document(s).
I find "a ton of end users", depending on how you define a ton (1k users ;) ?) and how frequent document updates are, that can mean a ton of ressources, can't cut it at some point, even using SSE
how many, how big, how persistant do the document(s) have to be ? Db-wise,can't say for lack of details and context, yeah could also be Redis, any RDBMS or nosql or even static json files stored on an Amazon S3 bucket .. anything really