Alternatives to AWS Data Pipeline logo

Alternatives to AWS Data Pipeline

AWS Glue, Airflow, AWS Step Functions, Apache NiFi, and AWS Batch are the most popular alternatives and competitors to AWS Data Pipeline.
95
395
+ 1
1

What is AWS Data Pipeline and what are its top alternatives?

AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email.
AWS Data Pipeline is a tool in the Data Transfer category of a tech stack.

Top Alternatives to AWS Data Pipeline

  • AWS Glue
    AWS Glue

    A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. ...

  • Airflow
    Airflow

    Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. ...

  • AWS Step Functions
    AWS Step Functions

    AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly. ...

  • Apache NiFi
    Apache NiFi

    An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. ...

  • AWS Batch
    AWS Batch

    It enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. ...

  • Azure Data Factory
    Azure Data Factory

    It is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud. ...

  • JavaScript
    JavaScript

    JavaScript is most known as the scripting language for Web pages, but used in many non-browser environments as well such as node.js or Apache CouchDB. It is a prototype-based, multi-paradigm scripting language that is dynamic,and supports object-oriented, imperative, and functional programming styles. ...

  • Git
    Git

    Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. ...

AWS Data Pipeline alternatives & related posts

AWS Glue logo

AWS Glue

448
806
9
Fully managed extract, transform, and load (ETL) service
448
806
+ 1
9
PROS OF AWS GLUE
  • 9
    Managed Hive Metastore
CONS OF AWS GLUE
    Be the first to leave a con

    related AWS Glue posts

    Will Dataflow be the right replacement for AWS Glue? Are there any unforeseen exceptions like certain proprietary transformations not supported in Google Cloud Dataflow, connectors ecosystem, Data Quality & Date cleansing not supported in DataFlow. etc?

    Also, how about Google Cloud Data Fusion as a replacement? In terms of No Code/Low code .. (Since basic use cases in Glue support UI, in that case, CDF may be the right choice ).

    What would be the best choice?

    See more
    Pardha Saradhi
    Technical Lead at Incred Financial Solutions · | 6 upvotes · 101.5K views

    Hi,

    We are currently storing the data in Amazon S3 using Apache Parquet format. We are using Presto to query the data from S3 and catalog it using AWS Glue catalog. We have Metabase sitting on top of Presto, where our reports are present. Currently, Presto is becoming too costly for us, and we are looking for alternatives for it but want to use the remaining setup (S3, Metabase) as much as possible. Please suggest alternative approaches.

    See more
    Airflow logo

    Airflow

    1.6K
    2.7K
    126
    A platform to programmaticaly author, schedule and monitor data pipelines, by Airbnb
    1.6K
    2.7K
    + 1
    126
    PROS OF AIRFLOW
    • 51
      Features
    • 14
      Task Dependency Management
    • 12
      Beautiful UI
    • 12
      Cluster of workers
    • 10
      Extensibility
    • 6
      Open source
    • 5
      Complex workflows
    • 5
      Python
    • 3
      Good api
    • 3
      Apache project
    • 3
      Custom operators
    • 2
      Dashboard
    CONS OF AIRFLOW
    • 2
      Observability is not great when the DAGs exceed 250
    • 2
      Running it on kubernetes cluster relatively complex
    • 2
      Open source - provides minimum or no support
    • 1
      Logical separation of DAGs is not straight forward

    related Airflow posts

    Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.

    Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”

    There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.

    Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.

    Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.

    Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.

    See more

    We are a young start-up with 2 developers and a team in India looking to choose our next ETL tool. We have a few processes in Azure Data Factory but are looking to switch to a better platform. We were debating Trifacta and Airflow. Or even staying with Azure Data Factory. The use case will be to feed data to front-end APIs.

    See more
    AWS Step Functions logo

    AWS Step Functions

    228
    380
    31
    Build Distributed Applications Using Visual Workflows
    228
    380
    + 1
    31
    PROS OF AWS STEP FUNCTIONS
    • 7
      Integration with other services
    • 5
      Easily Accessible via AWS Console
    • 5
      Complex workflows
    • 5
      Pricing
    • 3
      Scalability
    • 3
      Workflow Processing
    • 3
      High Availability
    CONS OF AWS STEP FUNCTIONS
      Be the first to leave a con

      related AWS Step Functions posts

      Shared insights
      on
      AWS Step FunctionsAWS Step FunctionsAirflowAirflow

      I am working on a project that grabs a set of input data from AWS S3, pre-processes and divvies it up, spins up 10K batch containers to process the divvied data in parallel on AWS Batch, post-aggregates the data, and pushes it to S3.

      I already have software patterns from other projects for Airflow + Batch but have not dealt with the scaling factors of 10k parallel tasks. Airflow is nice since I can look at which tasks failed and retry a task after debugging. But dealing with that many tasks on one Airflow EC2 instance seems like a barrier. Another option would be to have one task that kicks off the 10k containers and monitors it from there.

      I have no experience with AWS Step Functions but have heard it's AWS's Airflow. There looks to be plenty of patterns online for Step Functions + Batch. Do Step Functions seem like a good path to check out for my use case? Do you get the same insights on failing jobs / ability to retry tasks as you do with Airflow?

      See more
      Matheus Moreira
      Backend Engineer at IntuitiveCare · | 5 upvotes · 238K views
      Shared insights
      on
      AWS Step FunctionsAWS Step FunctionsAirflowAirflow

      We have some lambdas we need to orchestrate to get our workflow going. In the past, we already attempted to use Airflow as the orchestrator, but the need to coordinate the tasks in a database generates an overhead that we cannot afford. For our use case, there are hundreds of inputs per minute and we need to scale to support all the inputs and have an efficient way to analyze them later. The ideal product would be AWS Step Functions since it can manage our load demand graciously, but it is too expensive and we cannot afford that. So, I would like to get alternatives for an orchestrator that does not need a complex backend, can manage hundreds of inputs per minute, and is not too expensive.

      See more
      Apache NiFi logo

      Apache NiFi

      338
      681
      65
      A reliable system to process and distribute data
      338
      681
      + 1
      65
      PROS OF APACHE NIFI
      • 17
        Visual Data Flows using Directed Acyclic Graphs (DAGs)
      • 8
        Free (Open Source)
      • 7
        Simple-to-use
      • 5
        Scalable horizontally as well as vertically
      • 5
        Reactive with back-pressure
      • 4
        Fast prototyping
      • 3
        Bi-directional channels
      • 3
        End-to-end security between all nodes
      • 2
        Built-in graphical user interface
      • 2
        Can handle messages up to gigabytes in size
      • 2
        Data provenance
      • 1
        Lots of documentation
      • 1
        Hbase support
      • 1
        Support for custom Processor in Java
      • 1
        Hive support
      • 1
        Kudu support
      • 1
        Slack integration
      • 1
        Lot of articles
      CONS OF APACHE NIFI
      • 2
        HA support is not full fledge
      • 2
        Memory-intensive
      • 1
        Kkk

      related Apache NiFi posts

      John Calandra
      Data Manager at The Garrett Group · | 8 upvotes · 358K views

      There is a question coming... I am using Oracle VirtualBox to spawn 3 Ubuntu Linux virtual machines (VM). VM1 is being used as a data lake - just a place to store flat files. VM2 hosts Apache NiFi. VM3 hosts PostgreSQL. I have built a NiFi pipeline that reads flat files on VM1 and then pipes the data over to and inserts it into the Postgresql database. I left this setup alone for a while, and then something hiccupped on VM3, and I had to rebuild it. Now I cannot make a remote connection to Postgresql on VM3. I was using pgAdmin3 on VM3, but it kept throwing errors - I found out it went end-of-life in 2018 and uninstalled it. pgAdmin4 is out, but for some reason, I cannot get the APT utility to find/install it. I am trying to figure out the pgAdmin4 install problem and looking for a good alternative for pgAdmin4 that I can use to diagnose the remote database connection problem. Does anyone have any suggestions? Thanks in advance.

      See more

      I am looking for the best tool to orchestrate #ETL workflows in non-Hadoop environments, mainly for regression testing use cases. Would Airflow or Apache NiFi be a good fit for this purpose?

      For example, I want to run an Informatica ETL job and then run an SQL task as a dependency, followed by another task from Jira. What tool is best suited to set up such a pipeline?

      See more
      AWS Batch logo

      AWS Batch

      87
      247
      6
      Fully Managed Batch Processing at Any Scale
      87
      247
      + 1
      6
      PROS OF AWS BATCH
      • 3
        Containerized
      • 3
        Scalable
      CONS OF AWS BATCH
      • 3
        More overhead than lambda
      • 1
        Image management

      related AWS Batch posts

      Azure Data Factory logo

      Azure Data Factory

      240
      471
      0
      Hybrid data integration service that simplifies ETL at scale
      240
      471
      + 1
      0
      PROS OF AZURE DATA FACTORY
        Be the first to leave a pro
        CONS OF AZURE DATA FACTORY
          Be the first to leave a con

          related Azure Data Factory posts

          Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

          1. Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
          2. Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
          3. Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
          4. Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
          5. Processing-> We want to use SAS if at all possible. What will work with SAS code?
          6. Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
          7. I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
          8. An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, JavaScript, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!
          See more

          We are a young start-up with 2 developers and a team in India looking to choose our next ETL tool. We have a few processes in Azure Data Factory but are looking to switch to a better platform. We were debating Trifacta and Airflow. Or even staying with Azure Data Factory. The use case will be to feed data to front-end APIs.

          See more
          JavaScript logo

          JavaScript

          349.5K
          266.2K
          8.1K
          Lightweight, interpreted, object-oriented language with first-class functions
          349.5K
          266.2K
          + 1
          8.1K
          PROS OF JAVASCRIPT
          • 1.7K
            Can be used on frontend/backend
          • 1.5K
            It's everywhere
          • 1.2K
            Lots of great frameworks
          • 896
            Fast
          • 745
            Light weight
          • 425
            Flexible
          • 392
            You can't get a device today that doesn't run js
          • 286
            Non-blocking i/o
          • 236
            Ubiquitousness
          • 191
            Expressive
          • 55
            Extended functionality to web pages
          • 49
            Relatively easy language
          • 46
            Executed on the client side
          • 30
            Relatively fast to the end user
          • 25
            Pure Javascript
          • 21
            Functional programming
          • 15
            Async
          • 13
            Full-stack
          • 12
            Setup is easy
          • 12
            Its everywhere
          • 11
            JavaScript is the New PHP
          • 11
            Because I love functions
          • 10
            Like it or not, JS is part of the web standard
          • 9
            Can be used in backend, frontend and DB
          • 9
            Expansive community
          • 9
            Future Language of The Web
          • 9
            Easy
          • 8
            No need to use PHP
          • 8
            For the good parts
          • 8
            Can be used both as frontend and backend as well
          • 8
            Everyone use it
          • 8
            Most Popular Language in the World
          • 8
            Easy to hire developers
          • 7
            Love-hate relationship
          • 7
            Powerful
          • 7
            Photoshop has 3 JS runtimes built in
          • 7
            Evolution of C
          • 7
            Popularized Class-Less Architecture & Lambdas
          • 7
            Agile, packages simple to use
          • 7
            Supports lambdas and closures
          • 6
            1.6K Can be used on frontend/backend
          • 6
            It's fun
          • 6
            Hard not to use
          • 6
            Nice
          • 6
            Client side JS uses the visitors CPU to save Server Res
          • 6
            Versitile
          • 6
            It let's me use Babel & Typescript
          • 6
            Easy to make something
          • 6
            Its fun and fast
          • 6
            Can be used on frontend/backend/Mobile/create PRO Ui
          • 5
            Function expressions are useful for callbacks
          • 5
            What to add
          • 5
            Client processing
          • 5
            Everywhere
          • 5
            Scope manipulation
          • 5
            Stockholm Syndrome
          • 5
            Promise relationship
          • 5
            Clojurescript
          • 4
            Because it is so simple and lightweight
          • 4
            Only Programming language on browser
          • 1
            Hard to learn
          • 1
            Test
          • 1
            Test2
          • 1
            Easy to understand
          • 1
            Not the best
          • 1
            Easy to learn
          • 1
            Subskill #4
          • 0
            Hard 彤
          CONS OF JAVASCRIPT
          • 22
            A constant moving target, too much churn
          • 20
            Horribly inconsistent
          • 15
            Javascript is the New PHP
          • 9
            No ability to monitor memory utilitization
          • 8
            Shows Zero output in case of ANY error
          • 7
            Thinks strange results are better than errors
          • 6
            Can be ugly
          • 3
            No GitHub
          • 2
            Slow

          related JavaScript posts

          Zach Holman

          Oof. I have truly hated JavaScript for a long time. Like, for over twenty years now. Like, since the Clinton administration. It's always been a nightmare to deal with all of the aspects of that silly language.

          But wowza, things have changed. Tooling is just way, way better. I'm primarily web-oriented, and using React and Apollo together the past few years really opened my eyes to building rich apps. And I deeply apologize for using the phrase rich apps; I don't think I've ever said such Enterprisey words before.

          But yeah, things are different now. I still love Rails, and still use it for a lot of apps I build. But it's that silly rich apps phrase that's the problem. Users have way more comprehensive expectations than they did even five years ago, and the JS community does a good job at building tools and tech that tackle the problems of making heavy, complicated UI and frontend work.

          Obviously there's a lot of things happening here, so just saying "JavaScript isn't terrible" might encompass a huge amount of libraries and frameworks. But if you're like me, yeah, give things another shot- I'm somehow not hating on JavaScript anymore and... gulp... I kinda love it.

          See more
          Conor Myhrvold
          Tech Brand Mgr, Office of CTO at Uber · | 44 upvotes · 9.6M views

          How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:

          Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.

          Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:

          https://eng.uber.com/distributed-tracing/

          (GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)

          Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark

          See more
          Git logo

          Git

          288.5K
          173.5K
          6.6K
          Fast, scalable, distributed revision control system
          288.5K
          173.5K
          + 1
          6.6K
          PROS OF GIT
          • 1.4K
            Distributed version control system
          • 1.1K
            Efficient branching and merging
          • 959
            Fast
          • 845
            Open source
          • 726
            Better than svn
          • 368
            Great command-line application
          • 306
            Simple
          • 291
            Free
          • 232
            Easy to use
          • 222
            Does not require server
          • 27
            Distributed
          • 22
            Small & Fast
          • 18
            Feature based workflow
          • 15
            Staging Area
          • 13
            Most wide-spread VSC
          • 11
            Role-based codelines
          • 11
            Disposable Experimentation
          • 7
            Frictionless Context Switching
          • 6
            Data Assurance
          • 5
            Efficient
          • 4
            Just awesome
          • 3
            Github integration
          • 3
            Easy branching and merging
          • 2
            Compatible
          • 2
            Flexible
          • 2
            Possible to lose history and commits
          • 1
            Rebase supported natively; reflog; access to plumbing
          • 1
            Light
          • 1
            Team Integration
          • 1
            Fast, scalable, distributed revision control system
          • 1
            Easy
          • 1
            Flexible, easy, Safe, and fast
          • 1
            CLI is great, but the GUI tools are awesome
          • 1
            It's what you do
          • 0
            Phinx
          CONS OF GIT
          • 16
            Hard to learn
          • 11
            Inconsistent command line interface
          • 9
            Easy to lose uncommitted work
          • 7
            Worst documentation ever possibly made
          • 5
            Awful merge handling
          • 3
            Unexistent preventive security flows
          • 3
            Rebase hell
          • 2
            When --force is disabled, cannot rebase
          • 2
            Ironically even die-hard supporters screw up badly
          • 1
            Doesn't scale for big data

          related Git posts

          Simon Reymann
          Senior Fullstack Developer at QUANTUSflow Software GmbH · | 30 upvotes · 9M views

          Our whole DevOps stack consists of the following tools:

          • GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
          • Respectively Git as revision control system
          • SourceTree as Git GUI
          • Visual Studio Code as IDE
          • CircleCI for continuous integration (automatize development process)
          • Prettier / TSLint / ESLint as code linter
          • SonarQube as quality gate
          • Docker as container management (incl. Docker Compose for multi-container application management)
          • VirtualBox for operating system simulation tests
          • Kubernetes as cluster management for docker containers
          • Heroku for deploying in test environments
          • nginx as web server (preferably used as facade server in production environment)
          • SSLMate (using OpenSSL) for certificate management
          • Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
          • PostgreSQL as preferred database system
          • Redis as preferred in-memory database/store (great for caching)

          The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:

          • Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
          • Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
          • Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
          • Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
          • Scalability: All-in-one framework for distributed systems.
          • Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
          See more
          Tymoteusz Paul
          Devops guy at X20X Development LTD · | 23 upvotes · 8M views

          Often enough I have to explain my way of going about setting up a CI/CD pipeline with multiple deployment platforms. Since I am a bit tired of yapping the same every single time, I've decided to write it up and share with the world this way, and send people to read it instead ;). I will explain it on "live-example" of how the Rome got built, basing that current methodology exists only of readme.md and wishes of good luck (as it usually is ;)).

          It always starts with an app, whatever it may be and reading the readmes available while Vagrant and VirtualBox is installing and updating. Following that is the first hurdle to go over - convert all the instruction/scripts into Ansible playbook(s), and only stopping when doing a clear vagrant up or vagrant reload we will have a fully working environment. As our Vagrant environment is now functional, it's time to break it! This is the moment to look for how things can be done better (too rigid/too lose versioning? Sloppy environment setup?) and replace them with the right way to do stuff, one that won't bite us in the backside. This is the point, and the best opportunity, to upcycle the existing way of doing dev environment to produce a proper, production-grade product.

          I should probably digress here for a moment and explain why. I firmly believe that the way you deploy production is the same way you should deploy develop, shy of few debugging-friendly setting. This way you avoid the discrepancy between how production work vs how development works, which almost always causes major pains in the back of the neck, and with use of proper tools should mean no more work for the developers. That's why we start with Vagrant as developer boxes should be as easy as vagrant up, but the meat of our product lies in Ansible which will do meat of the work and can be applied to almost anything: AWS, bare metal, docker, LXC, in open net, behind vpn - you name it.

          We must also give proper consideration to monitoring and logging hoovering at this point. My generic answer here is to grab Elasticsearch, Kibana, and Logstash. While for different use cases there may be better solutions, this one is well battle-tested, performs reasonably and is very easy to scale both vertically (within some limits) and horizontally. Logstash rules are easy to write and are well supported in maintenance through Ansible, which as I've mentioned earlier, are at the very core of things, and creating triggers/reports and alerts based on Elastic and Kibana is generally a breeze, including some quite complex aggregations.

          If we are happy with the state of the Ansible it's time to move on and put all those roles and playbooks to work. Namely, we need something to manage our CI/CD pipelines. For me, the choice is obvious: TeamCity. It's modern, robust and unlike most of the light-weight alternatives, it's transparent. What I mean by that is that it doesn't tell you how to do things, doesn't limit your ways to deploy, or test, or package for that matter. Instead, it provides a developer-friendly and rich playground for your pipelines. You can do most the same with Jenkins, but it has a quite dated look and feel to it, while also missing some key functionality that must be brought in via plugins (like quality REST API which comes built-in with TeamCity). It also comes with all the common-handy plugins like Slack or Apache Maven integration.

          The exact flow between CI and CD varies too greatly from one application to another to describe, so I will outline a few rules that guide me in it: 1. Make build steps as small as possible. This way when something breaks, we know exactly where, without needing to dig and root around. 2. All security credentials besides development environment must be sources from individual Vault instances. Keys to those containers should exist only on the CI/CD box and accessible by a few people (the less the better). This is pretty self-explanatory, as anything besides dev may contain sensitive data and, at times, be public-facing. Because of that appropriate security must be present. TeamCity shines in this department with excellent secrets-management. 3. Every part of the build chain shall consume and produce artifacts. If it creates nothing, it likely shouldn't be its own build. This way if any issue shows up with any environment or version, all developer has to do it is grab appropriate artifacts to reproduce the issue locally. 4. Deployment builds should be directly tied to specific Git branches/tags. This enables much easier tracking of what caused an issue, including automated identifying and tagging the author (nothing like automated regression testing!).

          Speaking of deployments, I generally try to keep it simple but also with a close eye on the wallet. Because of that, I am more than happy with AWS or another cloud provider, but also constantly peeking at the loads and do we get the value of what we are paying for. Often enough the pattern of use is not constantly erratic, but rather has a firm baseline which could be migrated away from the cloud and into bare metal boxes. That is another part where this approach strongly triumphs over the common Docker and CircleCI setup, where you are very much tied in to use cloud providers and getting out is expensive. Here to embrace bare-metal hosting all you need is a help of some container-based self-hosting software, my personal preference is with Proxmox and LXC. Following that all you must write are ansible scripts to manage hardware of Proxmox, similar way as you do for Amazon EC2 (ansible supports both greatly) and you are good to go. One does not exclude another, quite the opposite, as they can live in great synergy and cut your costs dramatically (the heavier your base load, the bigger the savings) while providing production-grade resiliency.

          See more