Alternatives to Azure HDInsight logo

Alternatives to Azure HDInsight

Amazon EMR, Azure Databricks, Hadoop, Azure Machine Learning, and Azure Data Factory are the most popular alternatives and competitors to Azure HDInsight.
30
137
+ 1
0

What is Azure HDInsight and what are its top alternatives?

Azure HDInsight is a fully managed cloud-based service from Microsoft that provides Apache Hadoop and Apache Spark clusters. It allows users to process big data workloads in a cost-effective and scalable manner. Key features include support for various big data frameworks, integration with other Azure services, enterprise-grade security, and easy scalability. However, limitations include high costs for large workloads and potential complexity in managing different big data frameworks simultaneously.

  1. Amazon EMR: Amazon EMR is a cloud-based big data platform that utilizes various open-source tools such as Apache Spark, Hadoop, and Hive. Key features include easy setup, integration with other AWS services, and cost-effectiveness. Pros include seamless integration with AWS services, while cons include potential complexity for users not familiar with AWS.

  2. Google Cloud Dataproc: Google Cloud Dataproc is a managed Apache Spark and Hadoop service that runs on Google Cloud Platform. Key features include easy cluster management, autoscaling, and integration with other Google Cloud services. Pros include seamless integration with Google Cloud Platform, while cons include potential higher costs compared to other alternatives.

  3. Cloudera Distribution for Hadoop (CDH): CDH is a distribution of Apache Hadoop and related projects from Cloudera. Key features include comprehensive data management capabilities, enterprise-grade security, and support for various big data frameworks. Pros include extensive support and documentation, while cons include potential higher costs for enterprise deployments.

  4. MapR: MapR is a converged data platform that integrates Hadoop, Spark, and other big data frameworks. Key features include high performance, enterprise-grade reliability, and global data consistency. Pros include faster performance compared to other alternatives, while cons include potential higher costs for large-scale deployments.

  5. IBM BigInsights: IBM BigInsights is an enterprise-grade Hadoop distribution with additional analytics capabilities. Key features include advanced analytics tools, integration with IBM Watson services, and enterprise-grade security. Pros include seamless integration with IBM ecosystem, while cons include potential higher costs for smaller deployments.

  6. Hortonworks Data Platform (HDP): HDP is an open-source distribution of Apache Hadoop from Hortonworks. Key features include comprehensive data management tools, enterprise-grade security, and support for various big data frameworks. Pros include open-source nature, while cons include potential complexity in managing different components.

  7. Databricks: Databricks is a unified data analytics platform that leverages Apache Spark for big data processing. Key features include collaborative notebooks, automated cluster management, and integration with various data sources. Pros include ease of use for data scientists, while cons include potential higher costs for large-scale deployments.

  8. Qubole: Qubole is a cloud-native data platform that simplifies big data processing using Apache Spark, Hadoop, and Presto. Key features include self-service analytics, auto-scaling, and cost optimization. Pros include ease of use for data analysts, while cons include potential limitations in customization compared to other alternatives.

  9. Snowflake: Snowflake is a cloud data platform that offers a data warehouse-as-a-service solution for analytics. Key features include instant elasticity, built-in security, and support for structured and semi-structured data. Pros include easy scalability for varying workloads, while cons include potential limitations for unstructured data processing.

  10. Apache Flink: Apache Flink is an open-source stream processing framework that can also be used for batch processing. Key features include low-latency processing, fault tolerance, and support for event time processing. Pros include high throughput and low latency, while cons include potential complexity in setting up and managing Flink clusters.

Top Alternatives to Azure HDInsight

  • Amazon EMR
    Amazon EMR

    It is used in a variety of applications, including log analysis, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. ...

  • Azure Databricks
    Azure Databricks

    Accelerate big data analytics and artificial intelligence (AI) solutions with Azure Databricks, a fast, easy and collaborative Apache Spark–based analytics service. ...

  • Hadoop
    Hadoop

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...

  • Azure Machine Learning
    Azure Machine Learning

    Azure Machine Learning is a fully-managed cloud service that enables data scientists and developers to efficiently embed predictive analytics into their applications, helping organizations use massive data sets and bring all the benefits of the cloud to machine learning. ...

  • Azure Data Factory
    Azure Data Factory

    It is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud. ...

  • Databricks
    Databricks

    Databricks Unified Analytics Platform, from the original creators of Apache Spark™, unifies data science and engineering across the Machine Learning lifecycle from data preparation to experimentation and deployment of ML applications. ...

  • JavaScript
    JavaScript

    JavaScript is most known as the scripting language for Web pages, but used in many non-browser environments as well such as node.js or Apache CouchDB. It is a prototype-based, multi-paradigm scripting language that is dynamic,and supports object-oriented, imperative, and functional programming styles. ...

  • Git
    Git

    Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. ...

Azure HDInsight alternatives & related posts

Amazon EMR logo

Amazon EMR

541
680
54
Distribute your data and processing across a Amazon EC2 instances using Hadoop
541
680
+ 1
54
PROS OF AMAZON EMR
  • 15
    On demand processing power
  • 12
    Don't need to maintain Hadoop Cluster yourself
  • 7
    Hadoop Tools
  • 6
    Elastic
  • 4
    Backed by Amazon
  • 3
    Flexible
  • 3
    Economic - pay as you go, easy to use CLI and SDKs
  • 2
    Don't need a dedicated Ops group
  • 1
    Massive data handling
  • 1
    Great support
CONS OF AMAZON EMR
    Be the first to leave a con

    related Amazon EMR posts

    I have to build a data processing application with an Apache Beam stack and Apache Flink runner on an Amazon EMR cluster. I saw some instability with the process and EMR clusters that keep going down. Here, the Apache Beam application gets inputs from Kafka and sends the accumulative data streams to another Kafka topic. Any advice on how to make the process more stable?

    See more
    Shared insights
    on
    AWS GlueAWS GlueAmazon EMRAmazon EMR

    I use AWS Glue because I thought it was worth all they hype Fall 2018. However, you had to use Python 2.7 with no pandas support, and cold starts lasted as long as 15 minutes. Also, setting up a dev environment for iterative development was near impossible at the time.

    It was a terrible experience for me. I recommend using Amazon EMR instead. Even talking with a friend that works at Amazon, they use EMR instead of Glue for internal spark workloads. Just because a company makes something doesn't mean they use that something :/

    See more
    Azure Databricks logo

    Azure Databricks

    237
    378
    0
    Fast, easy, and collaborative Apache Spark–based analytics service
    237
    378
    + 1
    0
    PROS OF AZURE DATABRICKS
      Be the first to leave a pro
      CONS OF AZURE DATABRICKS
        Be the first to leave a con

        related Azure Databricks posts

        Hadoop logo

        Hadoop

        2.5K
        2.3K
        56
        Open-source software for reliable, scalable, distributed computing
        2.5K
        2.3K
        + 1
        56
        PROS OF HADOOP
        • 39
          Great ecosystem
        • 11
          One stack to rule them all
        • 4
          Great load balancer
        • 1
          Amazon aws
        • 1
          Java syntax
        CONS OF HADOOP
          Be the first to leave a con

          related Hadoop posts

          Shared insights
          on
          KafkaKafkaHadoopHadoop
          at

          The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.

          For databases, a custom Hadoop streamer pulled database data and wrote it to S3.

          Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.

          See more
          Conor Myhrvold
          Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 2.9M views

          Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

          Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

          https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

          (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

          See more
          Azure Machine Learning logo

          Azure Machine Learning

          240
          368
          0
          A fully-managed cloud service for predictive analytics
          240
          368
          + 1
          0
          PROS OF AZURE MACHINE LEARNING
            Be the first to leave a pro
            CONS OF AZURE MACHINE LEARNING
              Be the first to leave a con

              related Azure Machine Learning posts

              Azure Data Factory logo

              Azure Data Factory

              240
              470
              0
              Hybrid data integration service that simplifies ETL at scale
              240
              470
              + 1
              0
              PROS OF AZURE DATA FACTORY
                Be the first to leave a pro
                CONS OF AZURE DATA FACTORY
                  Be the first to leave a con

                  related Azure Data Factory posts

                  Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

                  1. Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
                  2. Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
                  3. Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
                  4. Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
                  5. Processing-> We want to use SAS if at all possible. What will work with SAS code?
                  6. Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
                  7. I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
                  8. An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, JavaScript, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!
                  See more

                  We are a young start-up with 2 developers and a team in India looking to choose our next ETL tool. We have a few processes in Azure Data Factory but are looking to switch to a better platform. We were debating Trifacta and Airflow. Or even staying with Azure Data Factory. The use case will be to feed data to front-end APIs.

                  See more
                  Databricks logo

                  Databricks

                  473
                  731
                  8
                  A unified analytics platform, powered by Apache Spark
                  473
                  731
                  + 1
                  8
                  PROS OF DATABRICKS
                  • 1
                    Best Performances on large datasets
                  • 1
                    True lakehouse architecture
                  • 1
                    Scalability
                  • 1
                    Databricks doesn't get access to your data
                  • 1
                    Usage Based Billing
                  • 1
                    Security
                  • 1
                    Data stays in your cloud account
                  • 1
                    Multicloud
                  CONS OF DATABRICKS
                    Be the first to leave a con

                    related Databricks posts

                    Jan Vlnas
                    Developer Advocate at Superface · | 5 upvotes · 330.9K views

                    From my point of view, both OpenRefine and Apache Hive serve completely different purposes. OpenRefine is intended for interactive cleaning of messy data locally. You could work with their libraries to use some of OpenRefine features as part of your data pipeline (there are pointers in FAQ), but OpenRefine in general is intended for a single-user local operation.

                    I can't recommend a particular alternative without better understanding of your use case. But if you are looking for an interactive tool to work with big data at scale, take a look at notebook environments like Jupyter, Databricks, or Deepnote. If you are building a data processing pipeline, consider also Apache Spark.

                    Edit: Fixed references from Hadoop to Hive, which is actually closer to Spark.

                    See more
                    Vamshi Krishna
                    Data Engineer at Tata Consultancy Services · | 4 upvotes · 243.7K views

                    I have to collect different data from multiple sources and store them in a single cloud location. Then perform cleaning and transforming using PySpark, and push the end results to other applications like reporting tools, etc. What would be the best solution? I can only think of Azure Data Factory + Databricks. Are there any alternatives to #AWS services + Databricks?

                    See more
                    JavaScript logo

                    JavaScript

                    349.6K
                    266.3K
                    8.1K
                    Lightweight, interpreted, object-oriented language with first-class functions
                    349.6K
                    266.3K
                    + 1
                    8.1K
                    PROS OF JAVASCRIPT
                    • 1.7K
                      Can be used on frontend/backend
                    • 1.5K
                      It's everywhere
                    • 1.2K
                      Lots of great frameworks
                    • 896
                      Fast
                    • 745
                      Light weight
                    • 425
                      Flexible
                    • 392
                      You can't get a device today that doesn't run js
                    • 286
                      Non-blocking i/o
                    • 236
                      Ubiquitousness
                    • 191
                      Expressive
                    • 55
                      Extended functionality to web pages
                    • 49
                      Relatively easy language
                    • 46
                      Executed on the client side
                    • 30
                      Relatively fast to the end user
                    • 25
                      Pure Javascript
                    • 21
                      Functional programming
                    • 15
                      Async
                    • 13
                      Full-stack
                    • 12
                      Setup is easy
                    • 12
                      Its everywhere
                    • 11
                      JavaScript is the New PHP
                    • 11
                      Because I love functions
                    • 10
                      Like it or not, JS is part of the web standard
                    • 9
                      Can be used in backend, frontend and DB
                    • 9
                      Expansive community
                    • 9
                      Future Language of The Web
                    • 9
                      Easy
                    • 8
                      No need to use PHP
                    • 8
                      For the good parts
                    • 8
                      Can be used both as frontend and backend as well
                    • 8
                      Everyone use it
                    • 8
                      Most Popular Language in the World
                    • 8
                      Easy to hire developers
                    • 7
                      Love-hate relationship
                    • 7
                      Powerful
                    • 7
                      Photoshop has 3 JS runtimes built in
                    • 7
                      Evolution of C
                    • 7
                      Popularized Class-Less Architecture & Lambdas
                    • 7
                      Agile, packages simple to use
                    • 7
                      Supports lambdas and closures
                    • 6
                      1.6K Can be used on frontend/backend
                    • 6
                      It's fun
                    • 6
                      Hard not to use
                    • 6
                      Nice
                    • 6
                      Client side JS uses the visitors CPU to save Server Res
                    • 6
                      Versitile
                    • 6
                      It let's me use Babel & Typescript
                    • 6
                      Easy to make something
                    • 6
                      Its fun and fast
                    • 6
                      Can be used on frontend/backend/Mobile/create PRO Ui
                    • 5
                      Function expressions are useful for callbacks
                    • 5
                      What to add
                    • 5
                      Client processing
                    • 5
                      Everywhere
                    • 5
                      Scope manipulation
                    • 5
                      Stockholm Syndrome
                    • 5
                      Promise relationship
                    • 5
                      Clojurescript
                    • 4
                      Because it is so simple and lightweight
                    • 4
                      Only Programming language on browser
                    • 1
                      Hard to learn
                    • 1
                      Test
                    • 1
                      Test2
                    • 1
                      Easy to understand
                    • 1
                      Not the best
                    • 1
                      Easy to learn
                    • 1
                      Subskill #4
                    • 0
                      Hard 彤
                    CONS OF JAVASCRIPT
                    • 22
                      A constant moving target, too much churn
                    • 20
                      Horribly inconsistent
                    • 15
                      Javascript is the New PHP
                    • 9
                      No ability to monitor memory utilitization
                    • 8
                      Shows Zero output in case of ANY error
                    • 7
                      Thinks strange results are better than errors
                    • 6
                      Can be ugly
                    • 3
                      No GitHub
                    • 2
                      Slow

                    related JavaScript posts

                    Zach Holman

                    Oof. I have truly hated JavaScript for a long time. Like, for over twenty years now. Like, since the Clinton administration. It's always been a nightmare to deal with all of the aspects of that silly language.

                    But wowza, things have changed. Tooling is just way, way better. I'm primarily web-oriented, and using React and Apollo together the past few years really opened my eyes to building rich apps. And I deeply apologize for using the phrase rich apps; I don't think I've ever said such Enterprisey words before.

                    But yeah, things are different now. I still love Rails, and still use it for a lot of apps I build. But it's that silly rich apps phrase that's the problem. Users have way more comprehensive expectations than they did even five years ago, and the JS community does a good job at building tools and tech that tackle the problems of making heavy, complicated UI and frontend work.

                    Obviously there's a lot of things happening here, so just saying "JavaScript isn't terrible" might encompass a huge amount of libraries and frameworks. But if you're like me, yeah, give things another shot- I'm somehow not hating on JavaScript anymore and... gulp... I kinda love it.

                    See more
                    Conor Myhrvold
                    Tech Brand Mgr, Office of CTO at Uber · | 44 upvotes · 9.6M views

                    How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:

                    Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.

                    Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:

                    https://eng.uber.com/distributed-tracing/

                    (GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)

                    Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark

                    See more
                    Git logo

                    Git

                    288.6K
                    173.6K
                    6.6K
                    Fast, scalable, distributed revision control system
                    288.6K
                    173.6K
                    + 1
                    6.6K
                    PROS OF GIT
                    • 1.4K
                      Distributed version control system
                    • 1.1K
                      Efficient branching and merging
                    • 959
                      Fast
                    • 845
                      Open source
                    • 726
                      Better than svn
                    • 368
                      Great command-line application
                    • 306
                      Simple
                    • 291
                      Free
                    • 232
                      Easy to use
                    • 222
                      Does not require server
                    • 27
                      Distributed
                    • 22
                      Small & Fast
                    • 18
                      Feature based workflow
                    • 15
                      Staging Area
                    • 13
                      Most wide-spread VSC
                    • 11
                      Role-based codelines
                    • 11
                      Disposable Experimentation
                    • 7
                      Frictionless Context Switching
                    • 6
                      Data Assurance
                    • 5
                      Efficient
                    • 4
                      Just awesome
                    • 3
                      Github integration
                    • 3
                      Easy branching and merging
                    • 2
                      Compatible
                    • 2
                      Flexible
                    • 2
                      Possible to lose history and commits
                    • 1
                      Rebase supported natively; reflog; access to plumbing
                    • 1
                      Light
                    • 1
                      Team Integration
                    • 1
                      Fast, scalable, distributed revision control system
                    • 1
                      Easy
                    • 1
                      Flexible, easy, Safe, and fast
                    • 1
                      CLI is great, but the GUI tools are awesome
                    • 1
                      It's what you do
                    • 0
                      Phinx
                    CONS OF GIT
                    • 16
                      Hard to learn
                    • 11
                      Inconsistent command line interface
                    • 9
                      Easy to lose uncommitted work
                    • 7
                      Worst documentation ever possibly made
                    • 5
                      Awful merge handling
                    • 3
                      Unexistent preventive security flows
                    • 3
                      Rebase hell
                    • 2
                      When --force is disabled, cannot rebase
                    • 2
                      Ironically even die-hard supporters screw up badly
                    • 1
                      Doesn't scale for big data

                    related Git posts

                    Simon Reymann
                    Senior Fullstack Developer at QUANTUSflow Software GmbH · | 30 upvotes · 9M views

                    Our whole DevOps stack consists of the following tools:

                    • GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
                    • Respectively Git as revision control system
                    • SourceTree as Git GUI
                    • Visual Studio Code as IDE
                    • CircleCI for continuous integration (automatize development process)
                    • Prettier / TSLint / ESLint as code linter
                    • SonarQube as quality gate
                    • Docker as container management (incl. Docker Compose for multi-container application management)
                    • VirtualBox for operating system simulation tests
                    • Kubernetes as cluster management for docker containers
                    • Heroku for deploying in test environments
                    • nginx as web server (preferably used as facade server in production environment)
                    • SSLMate (using OpenSSL) for certificate management
                    • Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
                    • PostgreSQL as preferred database system
                    • Redis as preferred in-memory database/store (great for caching)

                    The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:

                    • Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
                    • Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
                    • Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
                    • Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
                    • Scalability: All-in-one framework for distributed systems.
                    • Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
                    See more
                    Tymoteusz Paul
                    Devops guy at X20X Development LTD · | 23 upvotes · 8M views

                    Often enough I have to explain my way of going about setting up a CI/CD pipeline with multiple deployment platforms. Since I am a bit tired of yapping the same every single time, I've decided to write it up and share with the world this way, and send people to read it instead ;). I will explain it on "live-example" of how the Rome got built, basing that current methodology exists only of readme.md and wishes of good luck (as it usually is ;)).

                    It always starts with an app, whatever it may be and reading the readmes available while Vagrant and VirtualBox is installing and updating. Following that is the first hurdle to go over - convert all the instruction/scripts into Ansible playbook(s), and only stopping when doing a clear vagrant up or vagrant reload we will have a fully working environment. As our Vagrant environment is now functional, it's time to break it! This is the moment to look for how things can be done better (too rigid/too lose versioning? Sloppy environment setup?) and replace them with the right way to do stuff, one that won't bite us in the backside. This is the point, and the best opportunity, to upcycle the existing way of doing dev environment to produce a proper, production-grade product.

                    I should probably digress here for a moment and explain why. I firmly believe that the way you deploy production is the same way you should deploy develop, shy of few debugging-friendly setting. This way you avoid the discrepancy between how production work vs how development works, which almost always causes major pains in the back of the neck, and with use of proper tools should mean no more work for the developers. That's why we start with Vagrant as developer boxes should be as easy as vagrant up, but the meat of our product lies in Ansible which will do meat of the work and can be applied to almost anything: AWS, bare metal, docker, LXC, in open net, behind vpn - you name it.

                    We must also give proper consideration to monitoring and logging hoovering at this point. My generic answer here is to grab Elasticsearch, Kibana, and Logstash. While for different use cases there may be better solutions, this one is well battle-tested, performs reasonably and is very easy to scale both vertically (within some limits) and horizontally. Logstash rules are easy to write and are well supported in maintenance through Ansible, which as I've mentioned earlier, are at the very core of things, and creating triggers/reports and alerts based on Elastic and Kibana is generally a breeze, including some quite complex aggregations.

                    If we are happy with the state of the Ansible it's time to move on and put all those roles and playbooks to work. Namely, we need something to manage our CI/CD pipelines. For me, the choice is obvious: TeamCity. It's modern, robust and unlike most of the light-weight alternatives, it's transparent. What I mean by that is that it doesn't tell you how to do things, doesn't limit your ways to deploy, or test, or package for that matter. Instead, it provides a developer-friendly and rich playground for your pipelines. You can do most the same with Jenkins, but it has a quite dated look and feel to it, while also missing some key functionality that must be brought in via plugins (like quality REST API which comes built-in with TeamCity). It also comes with all the common-handy plugins like Slack or Apache Maven integration.

                    The exact flow between CI and CD varies too greatly from one application to another to describe, so I will outline a few rules that guide me in it: 1. Make build steps as small as possible. This way when something breaks, we know exactly where, without needing to dig and root around. 2. All security credentials besides development environment must be sources from individual Vault instances. Keys to those containers should exist only on the CI/CD box and accessible by a few people (the less the better). This is pretty self-explanatory, as anything besides dev may contain sensitive data and, at times, be public-facing. Because of that appropriate security must be present. TeamCity shines in this department with excellent secrets-management. 3. Every part of the build chain shall consume and produce artifacts. If it creates nothing, it likely shouldn't be its own build. This way if any issue shows up with any environment or version, all developer has to do it is grab appropriate artifacts to reproduce the issue locally. 4. Deployment builds should be directly tied to specific Git branches/tags. This enables much easier tracking of what caused an issue, including automated identifying and tagging the author (nothing like automated regression testing!).

                    Speaking of deployments, I generally try to keep it simple but also with a close eye on the wallet. Because of that, I am more than happy with AWS or another cloud provider, but also constantly peeking at the loads and do we get the value of what we are paying for. Often enough the pattern of use is not constantly erratic, but rather has a firm baseline which could be migrated away from the cloud and into bare metal boxes. That is another part where this approach strongly triumphs over the common Docker and CircleCI setup, where you are very much tied in to use cloud providers and getting out is expensive. Here to embrace bare-metal hosting all you need is a help of some container-based self-hosting software, my personal preference is with Proxmox and LXC. Following that all you must write are ansible scripts to manage hardware of Proxmox, similar way as you do for Amazon EC2 (ansible supports both greatly) and you are good to go. One does not exclude another, quite the opposite, as they can live in great synergy and cut your costs dramatically (the heavier your base load, the bigger the savings) while providing production-grade resiliency.

                    See more