Azure Data Factory

What is Azure Data Factory?

It is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud.

Azure Data Factory is a tool in the API Tools category of a tech stack.

Key Features

Real-Time IntegrationParallel ProcessingData ChunkerData MaskingProactive MonitoringBig Data Processing

Azure Data Factory Discussions

Discover why developers choose Azure Data Factory. Read real-world technical decisions and stack choices from the StackShare community.

Andres Crucetta

Dec 19, 2022

Needs adviceon

Azure Data Factory

Trifacta

Airflow

We are a young start-up with 2 developers and a team in India looking to choose our next ETL tool. We have a few processes in Azure Data Factory but are looking to switch to a better platform. We were debating Trifacta and Airflow. Or even staying with Azure Data Factory. The use case will be to feed data to front-end APIs.

0 views0

Comments

kew44

Nov 10, 2022

Needs adviceon

Amazon S3

AWS Glue

Azure Data Factory

Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
Storage->@{Amazon S3}|tool:25| seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
Data Catalog-> @{AWS Glue}|tool:8906|? @{Azure Data Factory}|tool:6356|? @{Snowplow}|tool:3097|? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
Partitions-> I've seen @{Cassandra}|tool:1032| and YARN mentioned, but have no experience with either
Processing-> We want to use SAS if at all possible. What will work with SAS code?
Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, @{JavaScript}|tool:1209|, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!

0 views0

Comments

Azure Data Factory Discussions

Discover why developers choose Azure Data Factory. Read real-world technical decisions and stack choices from the StackShare community.

Andres Crucetta

Dec 19, 2022

Needs adviceon

0 views0

Nov 10, 2022

Needs adviceon

Amazon S3

AWS Glue

Azure Data Factory

Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
Storage->@{Amazon S3}|tool:25| seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
Data Catalog-> @{AWS Glue}|tool:8906|? @{Azure Data Factory}|tool:6356|? @{Snowplow}|tool:3097|? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
Partitions-> I've seen @{Cassandra}|tool:1032| and YARN mentioned, but have no experience with either
Processing-> We want to use SAS if at all possible. What will work with SAS code?
Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, @{JavaScript}|tool:1209|, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!

0 views0

Comments

Azure Data Factory

What is Azure Data Factory?

Key Features

Azure Data Factory Pros & Cons

Pros of Azure Data Factory

Cons of Azure Data Factory

Azure Data Factory Integrations

Azure Data Factory Discussions

Azure Data Factory Alternatives & Comparisons

Apache Camel

Apache Spark

Splunk

Apache Flink

Amazon Athena

Apache Hive

Try It

Adoption

Azure Data Factory Integrations

Azure Data Factory Discussions