Airflow logo

Airflow

A platform to programmaticaly author, schedule and monitor data pipelines, by Airbnb

What is Airflow?

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed.
Airflow is a tool in the Workflow Manager category of a tech stack.
Airflow is an open source tool with GitHub stars and GitHub forks. Here’s a link to Airflow's open source repository on GitHub

Who uses Airflow?

Companies
351 companies reportedly use Airflow in their tech stacks, including Airbnb, Slack, and Robinhood.

Developers
1289 developers on StackShare have stated that they use Airflow.

Airflow Integrations

PlanetScaleDB, Dagster, Amazon Managed Workflows for Apache Airflow, Mage.ai, and Atlan are some of the popular tools that integrate with Airflow. Here's a list of all 15 tools that integrate with Airflow.
Pros of Airflow
53
Features
14
Task Dependency Management
12
Beautiful UI
12
Cluster of workers
10
Extensibility
6
Open source
5
Complex workflows
5
Python
3
Good api
3
Apache project
3
Custom operators
2
Dashboard
Decisions about Airflow

Here are some stack decisions, common use cases and reviews by companies and developers who chose Airflow in their tech stack.

Balaji Krishnamoorthy
Project Manager-GCP at Cognizant · | 3 upvotes · 22.5K views
Needs advice
on
AirflowAirflow
and
GitGit

Hi Team, I am a beginner to Airflow and Git. Please advise the tools and documents to excel on it . DAG is primary need for my current role and Git actions .

See more

We are a young start-up with 2 developers and a team in India looking to choose our next ETL tool. We have a few processes in Azure Data Factory but are looking to switch to a better platform. We were debating Trifacta and Airflow. Or even staying with Azure Data Factory. The use case will be to feed data to front-end APIs.

See more
Needs advice
on
AirflowAirflow
and
KafkaKafka

We're looking to do a project for a company that has incoming data from 2 sources, namely MongoDB and MySQL. We need to make it such that we are combining data from these 2 sources and showing it in real-time to PostgreSQL. Ideally, about 600,000 records per day. Which tool would be better for this use case? Airflow or Kafka?

See more
Matheus Moreira
Backend Engineer at IntuitiveCare · | 5 upvotes · 243.4K views

We have some lambdas we need to orchestrate to get our workflow going. In the past, we already attempted to use Airflow as the orchestrator, but the need to coordinate the tasks in a database generates an overhead that we cannot afford. For our use case, there are hundreds of inputs per minute and we need to scale to support all the inputs and have an efficient way to analyze them later. The ideal product would be AWS Step Functions since it can manage our load demand graciously, but it is too expensive and we cannot afford that. So, I would like to get alternatives for an orchestrator that does not need a complex backend, can manage hundreds of inputs per minute, and is not too expensive.

See more
Needs advice
on
AirflowAirflow
and
AWS Step FunctionsAWS Step Functions

I am working on a project that grabs a set of input data from AWS S3, pre-processes and divvies it up, spins up 10K batch containers to process the divvied data in parallel on AWS Batch, post-aggregates the data, and pushes it to S3.

I already have software patterns from other projects for Airflow + Batch but have not dealt with the scaling factors of 10k parallel tasks. Airflow is nice since I can look at which tasks failed and retry a task after debugging. But dealing with that many tasks on one Airflow EC2 instance seems like a barrier. Another option would be to have one task that kicks off the 10k containers and monitors it from there.

I have no experience with AWS Step Functions but have heard it's AWS's Airflow. There looks to be plenty of patterns online for Step Functions + Batch. Do Step Functions seem like a good path to check out for my use case? Do you get the same insights on failing jobs / ability to retry tasks as you do with Airflow?

See more
Needs advice
on
AirflowAirflow
and
Apache NiFiApache NiFi

I am looking for the best tool to orchestrate #ETL workflows in non-Hadoop environments, mainly for regression testing use cases. Would Airflow or Apache NiFi be a good fit for this purpose?

For example, I want to run an Informatica ETL job and then run an SQL task as a dependency, followed by another task from Jira. What tool is best suited to set up such a pipeline?

See more

Blog Posts

Jobs that mention Airflow as a desired skillset

See all jobs

Airflow's Features

  • Dynamic: Airflow pipelines are configuration as code (Python), allowing for dynamic pipeline generation. This allows for writting code that instantiate pipelines dynamically.
  • Extensible: Easily define your own operators, executors and extend the library so that it fits the level of abstraction that suits your environment.
  • Elegant: Airflow pipelines are lean and explicit. Parameterizing your scripts is built in the core of Airflow using powerful Jinja templating engine.
  • Scalable: Airflow has a modular architecture and uses a message queue to talk to orchestrate an arbitrary number of workers. Airflow is ready to scale to infinity.

Airflow Alternatives & Comparisons

What are some alternatives to Airflow?
Luigi
It is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache NiFi
An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
Jenkins
In a nutshell Jenkins CI is the leading open-source continuous integration server. Built with Java, it provides over 300 plugins to support building and testing virtually any project.
AWS Step Functions
AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.
Pachyderm
Pachyderm is an open source MapReduce engine that uses Docker containers for distributed computations.
See all alternatives

Airflow's Followers
2725 developers follow Airflow to keep up with related blogs and decisions.