Need advice about which tool to choose?Ask the StackShare community!

DVC

53
90
+ 1
2
Mercurial

229
216
+ 1
105
Add tool

DVC vs Mercurial: What are the differences?

Introduction

In this Markdown code, we will discuss the key differences between DVC (Data Version Control) and Mercurial, two popular version control systems. The differences between them are highlighted below.

  1. Integration with Data Science Workflow: DVC is specifically designed for data scientists and machine learning engineers, providing seamless integration with common data science tools such as Git, Jupyter notebooks, and cloud storage platforms. On the other hand, Mercurial is a general-purpose distributed version control system suitable for various types of projects, not specifically tailored for data science workflows.

  2. Data Versioning: DVC focuses on version control of large datasets and models, allowing efficient storage and management of data files in remote storage systems like AWS S3, Google Cloud Storage, and more. It provides an explicit and efficient way to version, share and collaborate on large-scale data projects. Mercurial, on the other hand, primarily focuses on version control of source code files and does not provide built-in features for large-scale data versioning.

  3. Workflow Automation: DVC offers advanced capabilities for automated workflows and reproducibility of experiments. It provides a DAG (Directed Acyclic Graph) visualization that enables data scientists to track dependencies and reproduce complex data pipelines easily. Mercurial, on the other hand, does not provide specific features for workflow automation and reproducibility.

  4. Branching and Merging: Both DVC and Mercurial support branching and merging operations. However, the approach and scope differ. In DVC, branching and merging are focused on managing changes in data artifacts, allowing users to create, switch between, and merge data branches efficiently. In contrast, Mercurial's branching and merging capabilities are mainly designed for source code management, allowing developers to create, switch between, and merge branches of code.

  5. Collaboration and Remote Work: Mercurial has been used for many years in various open-source projects, making it widespread and well-documented. It provides extensive support for collaboration, code reviews, and remote work scenarios. DVC, although gaining popularity in the data science community, is relatively newer and may have fewer resources and established practices for collaboration and remote work.

  6. Community Support and Ecosystem: Mercurial has a large community of users and developers, resulting in a rich ecosystem of plugins, extensions, and integrations with other tools. It has been extensively tested and used in a wide range of projects. DVC, being more focused on data science workflows, has a smaller but growing community and ecosystem. While DVC integrates well with common data science tools, it may have limited support for non-data-science-specific use cases.

In summary, DVC and Mercurial differ in their focus on data science workflows, data versioning capabilities, workflow automation features, collaboration and remote work support, as well as the size and maturity of their respective communities and ecosystems.

Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of DVC
Pros of Mercurial
  • 2
    Full reproducibility
  • 18
    A lot easier to extend than git
  • 17
    Easy-to-grasp system with nice tools
  • 13
    Works on windows natively without cygwin nonsense
  • 11
    Written in python
  • 9
    Free
  • 8
    Fast
  • 6
    Better than Git
  • 6
    Best GUI
  • 4
    Better than svn
  • 2
    Hg inc
  • 2
    Good user experience
  • 2
    TortoiseHg - Unified free gui for all platforms
  • 2
    Consistent UI
  • 2
    Easy-to-use
  • 2
    Native support to all platforms
  • 1
    Free to use

Sign up to add or upvote prosMake informed product decisions

Cons of DVC
Cons of Mercurial
  • 1
    Coupling between orchestration and version control
  • 1
    Requires working locally with the data
  • 1
    Doesn't scale for big data
  • 0
    Track single upstream only
  • 0
    Does not distinguish between local and remote head

Sign up to add or upvote consMake informed product decisions

- No public GitHub repository available -

What is DVC?

It is an open-source Version Control System for data science and machine learning projects. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.

What is Mercurial?

Mercurial is dedicated to speed and efficiency with a sane user interface. It is written in Python. Mercurial's implementation and data structures are designed to be fast. You can generate diffs between revisions, or jump back in time within seconds.

Need advice about which tool to choose?Ask the StackShare community!

What companies use DVC?
What companies use Mercurial?
See which teams inside your own company are using DVC or Mercurial.
Sign up for StackShare EnterpriseLearn More

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with DVC?
What tools integrate with Mercurial?

Sign up to get full access to all the tool integrationsMake informed product decisions

Blog Posts

Mar 4 2020 at 5:14PM

Atlassian

GitBitbucketWindows+4
3
1040
What are some alternatives to DVC and Mercurial?
Pachyderm
Pachyderm is an open source MapReduce engine that uses Docker containers for distributed computations.
MLflow
MLflow is an open source platform for managing the end-to-end machine learning lifecycle.
Git
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
SVN (Subversion)
Subversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations.
Plastic SCM
Plastic SCM is a distributed version control designed for big projects. It excels on branching and merging, graphical user interfaces, and can also deal with large files and even file-locking (great for game devs). It includes "semantic" features like refactor detection to ease diffing complex refactors.
See all alternatives