Druid logo

Druid

Fast column-oriented distributed data store

What is Druid?

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.
Druid is a tool in the Big Data Tools category of a tech stack.
Druid is an open source tool with 11.3K GitHub stars and 3.1K GitHub forks. Here’s a link to Druid's open source repository on GitHub

Who uses Druid?

Companies
49 companies reportedly use Druid in their tech stacks, including Airbnb, Instacart, and Hepsiburada.

Developers
253 developers on StackShare have stated that they use Druid.

Druid Integrations

Zookeeper, strongDM, Metabase Cloud, Querybook, and SigNoz are some of the popular tools that integrate with Druid. Here's a list of all 6 tools that integrate with Druid.
Pros of Druid
14
Real Time Aggregations
5
Batch and Real-Time Ingestion
4
OLAP
3
OLAP + OLTP
2
Combining stream and historical analytics
1
OLTP
Decisions about Druid

Here are some stack decisions, common use cases and reviews by companies and developers who chose Druid in their tech stack.

Umair Iftikhar
Technical Architect at Vappar · | 3 upvotes · 123.3K views

Developing a solution that collects Telemetry Data from different devices, nearly 1000 devices minimum and maximum 12000. Each device is sending 2 packets in 1 second. This is time-series data, and this data definition and different reports are saved on PostgreSQL. Like Building information, maintenance records, etc. I want to know about the best solution. This data is required for Math and ML to run different algorithms. Also, data is raw without definitions and information stored in PostgreSQL. Initially, I went with TimescaleDB due to PostgreSQL support, but to increase in sites, I started facing many issues with timescale DB in terms of flexibility of storing data.

My major requirement is also the replication of the database for reporting and different purposes. You may also suggest other options other than Druid and Cassandra. But an open source solution is appreciated.

See more

Blog Posts

Druid Alternatives & Comparisons

What are some alternatives to Druid?
HBase
Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
MongoDB
MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
Cassandra
Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
Prometheus
Prometheus is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
See all alternatives

Druid's Followers
658 developers follow Druid to keep up with related blogs and decisions.