Hadoop

Hadoop

Application and Data / Data Stores / Databases
Needs advice
on
AirflowAirflow
and
Apache NiFiApache NiFi

I am looking for the best tool to orchestrate #ETL workflows in non-Hadoop environments, mainly for regression testing use cases. Would Airflow or Apache NiFi be a good fit for this purpose?

For example, I want to run an Informatica ETL job and then run an SQL task as a dependency, followed by another task from Jira. What tool is best suited to set up such a pipeline?

READ MORE
4 upvotes·367.2K views
Replies (2)
Recommends
Airflow

I have been using Airflow for more than 2 years now and haven't thought about moving to any other platform. Coming back to your requirements, Airflow fits pretty well. 1. It has an excellent way to manage dependent tasks using DAG (Direct Acyclic Graph), You can create a DAG with tasks and manage which task is dependent on which and Airflow takes care of running it or not running a task in case the parent task fails. 2. Integrations - The airflow community has implemented various integration to different cloud services, to Hadoop, spark a and as well as Jira. Though it doesn't have in-built integration for Informatica you can also run your own service in Airflow as a task (which can handle all Informatica related operations).

  1. It's very easy to find/monitor and manage Jobs/Pipelines as Airflow provides a great consolidated UI.
READ MORE
5 upvotes·18.3K views
Sales Executive at Astronomer·
Recommends
Airflow

Hey Sathya! With Airflow, you are able to create custom hooks and operators to trigger various types of jobs. There may be ones that exist already for informatica, but I am unsure. Would be happy to connect to discuss further if you are interested. josh@astronomer.io

READ MORE
18.2K views
Needs advice
on
HadoopHadoopMarkLogicMarkLogic
and
SnowflakeSnowflake

For a property and casualty insurance company, we currently use MarkLogic and Hadoop for our raw data lake. Trying to figure out how snowflake fits in the picture. Does anybody have some good suggestions/best practices for when to use and what data to store in Mark logic versus Snowflake versus a hadoop or all three of these platforms redundant with one another?

READ MORE
4 upvotes·88K views