Alternatives to MLflow logo

Alternatives to MLflow

Kubeflow, Airflow, TensorFlow, DVC, and Seldon are the most popular alternatives and competitors to MLflow.
202
512
+ 1
9

What is MLflow and what are its top alternatives?

MLflow is an open source platform for managing the end-to-end machine learning lifecycle.
MLflow is a tool in the Machine Learning Tools category of a tech stack.
MLflow is an open source tool with 17.7K GitHub stars and 4K GitHub forks. Here’s a link to MLflow's open source repository on GitHub

Top Alternatives to MLflow

  • Kubeflow
    Kubeflow

    The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable by providing a straightforward way for spinning up best of breed OSS solutions. ...

  • Airflow
    Airflow

    Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. ...

  • TensorFlow
    TensorFlow

    TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. ...

  • DVC
    DVC

    It is an open-source Version Control System for data science and machine learning projects. It is designed to handle large files, data sets, machine learning models, and metrics as well as code. ...

  • Seldon
    Seldon

    Seldon is an Open Predictive Platform that currently allows recommendations to be generated based on structured historical data. It has a variety of algorithms to produce these recommendations and can report a variety of statistics. ...

  • Metaflow
    Metaflow

    It is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. It was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning. ...

  • JavaScript
    JavaScript

    JavaScript is most known as the scripting language for Web pages, but used in many non-browser environments as well such as node.js or Apache CouchDB. It is a prototype-based, multi-paradigm scripting language that is dynamic,and supports object-oriented, imperative, and functional programming styles. ...

  • Git
    Git

    Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. ...

MLflow alternatives & related posts

Kubeflow logo

Kubeflow

197
580
18
Machine Learning Toolkit for Kubernetes
197
580
+ 1
18
PROS OF KUBEFLOW
  • 9
    System designer
  • 3
    Google backed
  • 3
    Customisation
  • 3
    Kfp dsl
  • 0
    Azure
CONS OF KUBEFLOW
    Be the first to leave a con

    related Kubeflow posts

    Biswajit Pathak
    Project Manager at Sony · | 6 upvotes · 846.3K views

    Can you please advise which one to choose FastText Or Gensim, in terms of:

    1. Operability with ML Ops tools such as MLflow, Kubeflow, etc.
    2. Performance
    3. Customization of Intermediate steps
    4. FastText and Gensim both have the same underlying libraries
    5. Use cases each one tries to solve
    6. Unsupervised Vs Supervised dimensions
    7. Ease of Use.

    Please mention any other points that I may have missed here.

    See more
    Shared insights
    on
    KubeflowKubeflowKubernetesKubernetesMLflowMLflow

    We are trying to standardise DevOps across both ML (model selection and deployment) and regular software. Want to minimise the number of tools we have to learn. Also want a scalable solution which is easy enough to start small - eg. on a powerful laptop and eventually be deployed at scale. MLflow vs Kubernetes (Kubeflow)?

    See more
    Airflow logo

    Airflow

    1.7K
    2.7K
    126
    A platform to programmaticaly author, schedule and monitor data pipelines, by Airbnb
    1.7K
    2.7K
    + 1
    126
    PROS OF AIRFLOW
    • 51
      Features
    • 14
      Task Dependency Management
    • 12
      Beautiful UI
    • 12
      Cluster of workers
    • 10
      Extensibility
    • 6
      Open source
    • 5
      Complex workflows
    • 5
      Python
    • 3
      Good api
    • 3
      Apache project
    • 3
      Custom operators
    • 2
      Dashboard
    CONS OF AIRFLOW
    • 2
      Observability is not great when the DAGs exceed 250
    • 2
      Running it on kubernetes cluster relatively complex
    • 2
      Open source - provides minimum or no support
    • 1
      Logical separation of DAGs is not straight forward

    related Airflow posts

    Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.

    Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”

    There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.

    Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.

    Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.

    Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.

    See more

    We are a young start-up with 2 developers and a team in India looking to choose our next ETL tool. We have a few processes in Azure Data Factory but are looking to switch to a better platform. We were debating Trifacta and Airflow. Or even staying with Azure Data Factory. The use case will be to feed data to front-end APIs.

    See more
    TensorFlow logo

    TensorFlow

    3.7K
    3.5K
    106
    Open Source Software Library for Machine Intelligence
    3.7K
    3.5K
    + 1
    106
    PROS OF TENSORFLOW
    • 32
      High Performance
    • 19
      Connect Research and Production
    • 16
      Deep Flexibility
    • 12
      Auto-Differentiation
    • 11
      True Portability
    • 6
      Easy to use
    • 5
      High level abstraction
    • 5
      Powerful
    CONS OF TENSORFLOW
    • 9
      Hard
    • 6
      Hard to debug
    • 2
      Documentation not very helpful

    related TensorFlow posts

    Tom Klein

    Google Analytics is a great tool to analyze your traffic. To debug our software and ask questions, we love to use Postman and Stack Overflow. Google Drive helps our team to share documents. We're able to build our great products through the APIs by Google Maps, CloudFlare, Stripe, PayPal, Twilio, Let's Encrypt, and TensorFlow.

    See more
    Shared insights
    on
    TensorFlowTensorFlowDjangoDjangoPythonPython

    Hi, I have an LMS application, currently developed in Python-Django.

    It works all very well, students can view their classes and submit exams, but I have noticed that some students are sharing exam answers with other students and let's say they already have a model of the exams.

    I want with the help of artificial intelligence, the exams to have different questions and in a different order for each student, what technology should I learn to develop something like this? I am a Python-Django developer but my focus is on web development, I have never touched anything from A.I.

    What do you think about TensorFlow?

    Please, I would appreciate all your ideas and opinions, thank you very much in advance.

    See more
    DVC logo

    DVC

    55
    91
    2
    Open-source Version Control System for Machine Learning Projects
    55
    91
    + 1
    2
    PROS OF DVC
    • 2
      Full reproducibility
    CONS OF DVC
    • 1
      Coupling between orchestration and version control
    • 1
      Requires working locally with the data
    • 1
      Doesn't scale for big data

    related DVC posts

    Shared insights
    on
    MLflowMLflowDVCDVC

    I already use DVC to keep track and store my datasets in my machine learning pipeline. I have also started to use MLflow to keep track of my experiments. However, I still don't know whether to use DVC for my model files or I use the MLflow artifact store for this purpose. Or maybe these two serve different purposes, and it may be good to do both! Can anyone help, please?

    See more
    Seldon logo

    Seldon

    13
    46
    0
    Open-source predictive analytics and recommendation engine
    13
    46
    + 1
    0
    PROS OF SELDON
      Be the first to leave a pro
      CONS OF SELDON
        Be the first to leave a con

        related Seldon posts

        Metaflow logo

        Metaflow

        15
        50
        0
        Build and manage real-life data science projects with ease (by Netflix)
        15
        50
        + 1
        0
        PROS OF METAFLOW
          Be the first to leave a pro
          CONS OF METAFLOW
            Be the first to leave a con

            related Metaflow posts

            JavaScript logo

            JavaScript

            353K
            268.5K
            8.1K
            Lightweight, interpreted, object-oriented language with first-class functions
            353K
            268.5K
            + 1
            8.1K
            PROS OF JAVASCRIPT
            • 1.7K
              Can be used on frontend/backend
            • 1.5K
              It's everywhere
            • 1.2K
              Lots of great frameworks
            • 897
              Fast
            • 745
              Light weight
            • 425
              Flexible
            • 392
              You can't get a device today that doesn't run js
            • 286
              Non-blocking i/o
            • 237
              Ubiquitousness
            • 191
              Expressive
            • 55
              Extended functionality to web pages
            • 49
              Relatively easy language
            • 46
              Executed on the client side
            • 30
              Relatively fast to the end user
            • 25
              Pure Javascript
            • 21
              Functional programming
            • 15
              Async
            • 13
              Full-stack
            • 12
              Setup is easy
            • 12
              Future Language of The Web
            • 12
              Its everywhere
            • 11
              Because I love functions
            • 11
              JavaScript is the New PHP
            • 10
              Like it or not, JS is part of the web standard
            • 9
              Expansive community
            • 9
              Everyone use it
            • 9
              Can be used in backend, frontend and DB
            • 9
              Easy
            • 8
              Most Popular Language in the World
            • 8
              Powerful
            • 8
              Can be used both as frontend and backend as well
            • 8
              For the good parts
            • 8
              No need to use PHP
            • 8
              Easy to hire developers
            • 7
              Agile, packages simple to use
            • 7
              Love-hate relationship
            • 7
              Photoshop has 3 JS runtimes built in
            • 7
              Evolution of C
            • 7
              It's fun
            • 7
              Hard not to use
            • 7
              Versitile
            • 7
              Its fun and fast
            • 7
              Nice
            • 7
              Popularized Class-Less Architecture & Lambdas
            • 7
              Supports lambdas and closures
            • 6
              It let's me use Babel & Typescript
            • 6
              Can be used on frontend/backend/Mobile/create PRO Ui
            • 6
              1.6K Can be used on frontend/backend
            • 6
              Client side JS uses the visitors CPU to save Server Res
            • 6
              Easy to make something
            • 5
              Clojurescript
            • 5
              Promise relationship
            • 5
              Stockholm Syndrome
            • 5
              Function expressions are useful for callbacks
            • 5
              Scope manipulation
            • 5
              Everywhere
            • 5
              Client processing
            • 5
              What to add
            • 4
              Because it is so simple and lightweight
            • 4
              Only Programming language on browser
            • 1
              Test
            • 1
              Hard to learn
            • 1
              Test2
            • 1
              Not the best
            • 1
              Easy to understand
            • 1
              Subskill #4
            • 1
              Easy to learn
            • 0
              Hard 彤
            CONS OF JAVASCRIPT
            • 22
              A constant moving target, too much churn
            • 20
              Horribly inconsistent
            • 15
              Javascript is the New PHP
            • 9
              No ability to monitor memory utilitization
            • 8
              Shows Zero output in case of ANY error
            • 7
              Thinks strange results are better than errors
            • 6
              Can be ugly
            • 3
              No GitHub
            • 2
              Slow

            related JavaScript posts

            Zach Holman

            Oof. I have truly hated JavaScript for a long time. Like, for over twenty years now. Like, since the Clinton administration. It's always been a nightmare to deal with all of the aspects of that silly language.

            But wowza, things have changed. Tooling is just way, way better. I'm primarily web-oriented, and using React and Apollo together the past few years really opened my eyes to building rich apps. And I deeply apologize for using the phrase rich apps; I don't think I've ever said such Enterprisey words before.

            But yeah, things are different now. I still love Rails, and still use it for a lot of apps I build. But it's that silly rich apps phrase that's the problem. Users have way more comprehensive expectations than they did even five years ago, and the JS community does a good job at building tools and tech that tackle the problems of making heavy, complicated UI and frontend work.

            Obviously there's a lot of things happening here, so just saying "JavaScript isn't terrible" might encompass a huge amount of libraries and frameworks. But if you're like me, yeah, give things another shot- I'm somehow not hating on JavaScript anymore and... gulp... I kinda love it.

            See more
            Conor Myhrvold
            Tech Brand Mgr, Office of CTO at Uber · | 44 upvotes · 10.9M views

            How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:

            Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.

            Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:

            https://eng.uber.com/distributed-tracing/

            (GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)

            Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark

            See more
            Git logo

            Git

            292K
            175.1K
            6.6K
            Fast, scalable, distributed revision control system
            292K
            175.1K
            + 1
            6.6K
            PROS OF GIT
            • 1.4K
              Distributed version control system
            • 1.1K
              Efficient branching and merging
            • 959
              Fast
            • 845
              Open source
            • 726
              Better than svn
            • 368
              Great command-line application
            • 306
              Simple
            • 291
              Free
            • 232
              Easy to use
            • 222
              Does not require server
            • 27
              Distributed
            • 22
              Small & Fast
            • 18
              Feature based workflow
            • 15
              Staging Area
            • 13
              Most wide-spread VSC
            • 11
              Role-based codelines
            • 11
              Disposable Experimentation
            • 7
              Frictionless Context Switching
            • 6
              Data Assurance
            • 5
              Efficient
            • 4
              Just awesome
            • 3
              Github integration
            • 3
              Easy branching and merging
            • 2
              Compatible
            • 2
              Flexible
            • 2
              Possible to lose history and commits
            • 1
              Rebase supported natively; reflog; access to plumbing
            • 1
              Light
            • 1
              Team Integration
            • 1
              Fast, scalable, distributed revision control system
            • 1
              Easy
            • 1
              Flexible, easy, Safe, and fast
            • 1
              CLI is great, but the GUI tools are awesome
            • 1
              It's what you do
            • 0
              Phinx
            CONS OF GIT
            • 16
              Hard to learn
            • 11
              Inconsistent command line interface
            • 9
              Easy to lose uncommitted work
            • 7
              Worst documentation ever possibly made
            • 5
              Awful merge handling
            • 3
              Unexistent preventive security flows
            • 3
              Rebase hell
            • 2
              When --force is disabled, cannot rebase
            • 2
              Ironically even die-hard supporters screw up badly
            • 1
              Doesn't scale for big data

            related Git posts

            Simon Reymann
            Senior Fullstack Developer at QUANTUSflow Software GmbH · | 30 upvotes · 9.7M views

            Our whole DevOps stack consists of the following tools:

            • GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
            • Respectively Git as revision control system
            • SourceTree as Git GUI
            • Visual Studio Code as IDE
            • CircleCI for continuous integration (automatize development process)
            • Prettier / TSLint / ESLint as code linter
            • SonarQube as quality gate
            • Docker as container management (incl. Docker Compose for multi-container application management)
            • VirtualBox for operating system simulation tests
            • Kubernetes as cluster management for docker containers
            • Heroku for deploying in test environments
            • nginx as web server (preferably used as facade server in production environment)
            • SSLMate (using OpenSSL) for certificate management
            • Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
            • PostgreSQL as preferred database system
            • Redis as preferred in-memory database/store (great for caching)

            The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:

            • Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
            • Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
            • Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
            • Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
            • Scalability: All-in-one framework for distributed systems.
            • Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
            See more
            Tymoteusz Paul
            Devops guy at X20X Development LTD · | 23 upvotes · 8.7M views

            Often enough I have to explain my way of going about setting up a CI/CD pipeline with multiple deployment platforms. Since I am a bit tired of yapping the same every single time, I've decided to write it up and share with the world this way, and send people to read it instead ;). I will explain it on "live-example" of how the Rome got built, basing that current methodology exists only of readme.md and wishes of good luck (as it usually is ;)).

            It always starts with an app, whatever it may be and reading the readmes available while Vagrant and VirtualBox is installing and updating. Following that is the first hurdle to go over - convert all the instruction/scripts into Ansible playbook(s), and only stopping when doing a clear vagrant up or vagrant reload we will have a fully working environment. As our Vagrant environment is now functional, it's time to break it! This is the moment to look for how things can be done better (too rigid/too lose versioning? Sloppy environment setup?) and replace them with the right way to do stuff, one that won't bite us in the backside. This is the point, and the best opportunity, to upcycle the existing way of doing dev environment to produce a proper, production-grade product.

            I should probably digress here for a moment and explain why. I firmly believe that the way you deploy production is the same way you should deploy develop, shy of few debugging-friendly setting. This way you avoid the discrepancy between how production work vs how development works, which almost always causes major pains in the back of the neck, and with use of proper tools should mean no more work for the developers. That's why we start with Vagrant as developer boxes should be as easy as vagrant up, but the meat of our product lies in Ansible which will do meat of the work and can be applied to almost anything: AWS, bare metal, docker, LXC, in open net, behind vpn - you name it.

            We must also give proper consideration to monitoring and logging hoovering at this point. My generic answer here is to grab Elasticsearch, Kibana, and Logstash. While for different use cases there may be better solutions, this one is well battle-tested, performs reasonably and is very easy to scale both vertically (within some limits) and horizontally. Logstash rules are easy to write and are well supported in maintenance through Ansible, which as I've mentioned earlier, are at the very core of things, and creating triggers/reports and alerts based on Elastic and Kibana is generally a breeze, including some quite complex aggregations.

            If we are happy with the state of the Ansible it's time to move on and put all those roles and playbooks to work. Namely, we need something to manage our CI/CD pipelines. For me, the choice is obvious: TeamCity. It's modern, robust and unlike most of the light-weight alternatives, it's transparent. What I mean by that is that it doesn't tell you how to do things, doesn't limit your ways to deploy, or test, or package for that matter. Instead, it provides a developer-friendly and rich playground for your pipelines. You can do most the same with Jenkins, but it has a quite dated look and feel to it, while also missing some key functionality that must be brought in via plugins (like quality REST API which comes built-in with TeamCity). It also comes with all the common-handy plugins like Slack or Apache Maven integration.

            The exact flow between CI and CD varies too greatly from one application to another to describe, so I will outline a few rules that guide me in it: 1. Make build steps as small as possible. This way when something breaks, we know exactly where, without needing to dig and root around. 2. All security credentials besides development environment must be sources from individual Vault instances. Keys to those containers should exist only on the CI/CD box and accessible by a few people (the less the better). This is pretty self-explanatory, as anything besides dev may contain sensitive data and, at times, be public-facing. Because of that appropriate security must be present. TeamCity shines in this department with excellent secrets-management. 3. Every part of the build chain shall consume and produce artifacts. If it creates nothing, it likely shouldn't be its own build. This way if any issue shows up with any environment or version, all developer has to do it is grab appropriate artifacts to reproduce the issue locally. 4. Deployment builds should be directly tied to specific Git branches/tags. This enables much easier tracking of what caused an issue, including automated identifying and tagging the author (nothing like automated regression testing!).

            Speaking of deployments, I generally try to keep it simple but also with a close eye on the wallet. Because of that, I am more than happy with AWS or another cloud provider, but also constantly peeking at the loads and do we get the value of what we are paying for. Often enough the pattern of use is not constantly erratic, but rather has a firm baseline which could be migrated away from the cloud and into bare metal boxes. That is another part where this approach strongly triumphs over the common Docker and CircleCI setup, where you are very much tied in to use cloud providers and getting out is expensive. Here to embrace bare-metal hosting all you need is a help of some container-based self-hosting software, my personal preference is with Proxmox and LXC. Following that all you must write are ansible scripts to manage hardware of Proxmox, similar way as you do for Amazon EC2 (ansible supports both greatly) and you are good to go. One does not exclude another, quite the opposite, as they can live in great synergy and cut your costs dramatically (the heavier your base load, the bigger the savings) while providing production-grade resiliency.

            See more