Need advice about which tool to choose?Ask the StackShare community!

Druid

378
862
+ 1
32
Snowflake

1.1K
1.2K
+ 1
27
Add tool

Druid vs Snowflake: What are the differences?

Introduction

This markdown code provides a comparison between Druid and Snowflake in terms of their key differences.

  1. Scalability: Druid is designed to handle real-time querying and analysis of large datasets with sub-second query latencies. It uses distributed architecture and can handle petabytes of data. In contrast, Snowflake is a cloud data warehouse that can handle massive amounts of structured and semi-structured data. It scales well for large datasets and concurrent users, but its query latency may be higher compared to Druid.

  2. Data Types and Storage: Druid is optimized for time-series data and has native support for time-based aggregations and filtering. It stores data in a columnar format, which enables efficient compression and fast querying. On the other hand, Snowflake supports a wide range of data types and stores data in a structured columnar format, allowing for efficient storage and retrieval of different data types.

  3. Querying Abilities: Druid supports interactive queries on real-time and historical data and provides fast aggregations and filtering capabilities. It also supports complex event processing and streaming data ingestion. Snowflake, on the other hand, supports both real-time and batch processing. It offers advanced SQL querying capabilities and supports complex joins and aggregations on large datasets.

  4. Data Ingestion and Updates: Druid supports real-time data ingestion using its native ingestion framework, which enables continuous data ingestion and indexing for real-time querying. It also supports batch data ingestion for historical data. Snowflake, on the other hand, supports both real-time and batch data ingestion. It has built-in connectors for various data sources and supports data updates and deletes.

  5. Query Performance and Optimization: Druid is optimized for fast query performance and provides features like pre-aggregation and indexing to improve query speed. It uses caching and query optimization techniques to minimize query latencies. Snowflake, on the other hand, uses a combination of columnar storage, query optimization, and workload management to provide optimal performance. It automatically scales resources based on the workload and provides automatic query optimization.

  6. Cost and Pricing Model: Druid is an open-source project, so there are no upfront costs associated with its usage. However, it requires infrastructure setup and maintenance costs. Snowflake is a cloud-based service and follows a pay-as-you-go pricing model. It offers different pricing tiers based on usage and provides flexibility in terms of scaling resources up or down, depending on the needs.

In summary, Druid and Snowflake differ in scalability, data types and storage, querying abilities, data ingestion and updates, query performance and optimization, and cost and pricing model.

Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of Druid
Pros of Snowflake
  • 15
    Real Time Aggregations
  • 6
    Batch and Real-Time Ingestion
  • 5
    OLAP
  • 3
    OLAP + OLTP
  • 2
    Combining stream and historical analytics
  • 1
    OLTP
  • 7
    Public and Private Data Sharing
  • 4
    Multicloud
  • 4
    Good Performance
  • 4
    User Friendly
  • 3
    Great Documentation
  • 2
    Serverless
  • 1
    Economical
  • 1
    Usage based billing
  • 1
    Innovative

Sign up to add or upvote prosMake informed product decisions

Cons of Druid
Cons of Snowflake
  • 3
    Limited sql support
  • 2
    Joins are not supported well
  • 1
    Complexity
    Be the first to leave a con

    Sign up to add or upvote consMake informed product decisions

    What is Druid?

    Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

    What is Snowflake?

    Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.

    Need advice about which tool to choose?Ask the StackShare community!

    Jobs that mention Druid and Snowflake as a desired skillset
    What companies use Druid?
    What companies use Snowflake?
    See which teams inside your own company are using Druid or Snowflake.
    Sign up for StackShare EnterpriseLearn More

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Druid?
    What tools integrate with Snowflake?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    Dec 22 2021 at 5:41AM

    Pinterest

    MySQLKafkaDruid+3
    3
    565
    MySQLKafkaApache Spark+6
    2
    1999
    Jul 2 2019 at 9:34PM

    Segment

    Google AnalyticsAmazon S3New Relic+25
    10
    6735
    What are some alternatives to Druid and Snowflake?
    HBase
    Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
    MongoDB
    MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
    Cassandra
    Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
    Prometheus
    Prometheus is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
    Elasticsearch
    Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
    See all alternatives