Druid vs Elasticsearch

Overview

Elasticsearch

Stacks35.5K

Followers27.1K

Votes1.6K

Druid

Stacks376

Followers867

Votes32

Druid vs Elasticsearch: What are the differences?

Introduction:

Druid and Elasticsearch are both open-source distributed data stores used for real-time analytics and search purposes. While they share some similarities, they also have key differences that set them apart in terms of functionality and use cases. Below are the key differences between Druid and Elasticsearch.

Data Model and Querying: Druid is designed specifically for time-series and event-driven data, making it ideal for analyzing real-time streaming data. It excels at performing fast aggregations and time-based queries, offering sub-second query response times. On the other hand, Elasticsearch is a document-oriented search engine that is optimized for full-text search and complex queries on structured and unstructured data. It provides powerful search functionality, including features like fuzzy matching and relevance scoring.
Scalability: Both Druid and Elasticsearch offer horizontal scalability, allowing them to handle large amounts of data. However, Druid is designed to scale for high ingestion rates and supports real-time data streaming, making it well-suited for deployments that require fast, continuous data updates. Elasticsearch, on the other hand, can handle massive amounts of indexed data and is commonly used for log analysis, monitoring, and search use cases.
Storage and Indexing: Druid uses a columnar storage format that optimizes data processing and query performance. It compresses and indexes data in memory for faster access, enabling efficient aggregations and filtering. Elasticsearch, on the other hand, leverages a distributed inverted index for indexing and searching documents. It is highly flexible in terms of data schema and allows for real-time indexing and search updates.
Aggregation Capabilities: Druid is known for its powerful and efficient aggregations, making it a preferred choice for analyzing high-dimensional and time-based data. It can perform complex roll-up, slicing-and-dicing, and grouping operations on large datasets, providing quick insights into time-series data. Elasticsearch also supports aggregations but may face performance limitations when dealing with large datasets or complex aggregations.
Real-Time Analytics vs. Real-Time Search: While both Druid and Elasticsearch provide real-time capabilities, they focus on different aspects of real-time data processing. Druid is optimized for real-time analytics and exploratory data analysis, offering fast query response times and support for complex analytical queries. Elasticsearch, on the other hand, excels in real-time search scenarios, allowing users to perform fast and accurate full-text searches on large, constantly changing datasets.
Use Cases: Due to their differences in data model and capabilities, Druid and Elasticsearch cater to different use cases. Druid is commonly used for operational analytics, time-series analysis, and real-time monitoring, making it well-suited for applications in the IoT, ad tech, and log analytics domains. Elasticsearch, on the other hand, finds applications in search and recommendation engines, log analysis, e-commerce search, and content management systems.

In Summary, Druid and Elasticsearch differ in their data model and querying capabilities, scalability, storage and indexing approach, aggregation capabilities, focus on real-time analytics versus real-time search, and their use cases.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Elasticsearch, Druid

Rana Usman

Chief Technology Officer at TechAvanza

Jun 4, 2020

Needs adviceon

Firebase

Elasticsearch

Algolia

Hey everybody! (1) I am developing an android application. I have data of around 3 million record (less than a TB). I want to save that data in the cloud. Which company provides the best cloud database services that would suit my scenario? It should be secured, long term useable, and provide better services. I decided to use Firebase Realtime database. Should I stick with Firebase or are there any other companies that provide a better service?

(2) I have the functionality of searching data in my app. Same data (less than a TB). Which search solution should I use in this case? I found Elasticsearch and Algolia search. It should be secure and fast. If any other company provides better services than these, please feel free to suggest them.

Thank you!

408k views408k

Comments

Detailed Comparison

Elasticsearch	Druid
Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).	Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.
Distributed and Highly Available Search Engine;Multi Tenant with Multi Types;Various set of APIs including RESTful;Clients available in many languages including Java, Python, .NET, C#, Groovy, and more;Document oriented;Reliable, Asynchronous Write Behind for long term persistency;(Near) Real Time Search;Built on top of Apache Lucene;Per operation consistency;Inverted indices with finite state transducers for full-text querying;BKD trees for storing numeric and geo data;Column store for analytics;Compatible with Hadoop using the ES-Hadoop connector;Open Source under Apache 2 and Elastic License	-
Statistics
Stacks 35.5K	Stacks 376
Followers 27.1K	Followers 867
Votes 1.6K	Votes 32
Pros & Cons
Pros 329 Powerful api 315 Great search engine 231 Open source 214 Restful 200 Near real-time search Cons 7 Resource hungry 6 Diffecult to get started 5 Expensive 4 Hard to keep stable at large scale	Pros 15 Real Time Aggregations 6 Batch and Real-Time Ingestion 5 OLAP 3 OLAP + OLTP 2 Combining stream and historical analytics Cons 3 Limited sql support 2 Joins are not supported well 1 Complexity
Integrations
Kibana Beats Logstash	Zookeeper

What are some alternatives to Elasticsearch, Druid?

Algolia

Our mission is to make you a search expert. Push data to our API to make it searchable in real time. Build your dream front end with one of our web or mobile UI libraries. Tune relevance and get analytics right from your dashboard.

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Presto

Distributed SQL Query Engine for Big Data

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Typesense

It is an open source, typo tolerant search engine that delivers fast and relevant results out-of-the-box. has been built from scratch to offer a delightful, out-of-the-box search experience. From instant search to autosuggest, to faceted search, it has got you covered.

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

lakeFS

It is an open-source data version control system for data lakes. It provides a “Git for data” platform enabling you to implement best practices from software engineering on your data lake, including branching and merging, CI/CD, and production-like dev/test environments.

Amazon CloudSearch

Amazon CloudSearch enables you to search large collections of data such as web pages, document files, forum posts, or product information. With a few clicks in the AWS Management Console, you can create a search domain, upload the data you want to make searchable to Amazon CloudSearch, and the search service automatically provisions the required technology resources and deploys a highly tuned search index.

Amazon Elasticsearch Service

Amazon Elasticsearch Service is a fully managed service that makes it easy for you to deploy, secure, and operate Elasticsearch at scale with zero down time.

Apache Kylin

Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets, originally contributed from eBay Inc.

Related Comparisons

Druid vs Elasticsearch: What are the differences?

Introduction:

Data Model and Querying: Druid is designed specifically for time-series and event-driven data, making it ideal for analyzing real-time streaming data. It excels at performing fast aggregations and time-based queries, offering sub-second query response times. On the other hand, Elasticsearch is a document-oriented search engine that is optimized for full-text search and complex queries on structured and unstructured data. It provides powerful search functionality, including features like fuzzy matching and relevance scoring.
Scalability: Both Druid and Elasticsearch offer horizontal scalability, allowing them to handle large amounts of data. However, Druid is designed to scale for high ingestion rates and supports real-time data streaming, making it well-suited for deployments that require fast, continuous data updates. Elasticsearch, on the other hand, can handle massive amounts of indexed data and is commonly used for log analysis, monitoring, and search use cases.
Storage and Indexing: Druid uses a columnar storage format that optimizes data processing and query performance. It compresses and indexes data in memory for faster access, enabling efficient aggregations and filtering. Elasticsearch, on the other hand, leverages a distributed inverted index for indexing and searching documents. It is highly flexible in terms of data schema and allows for real-time indexing and search updates.
Aggregation Capabilities: Druid is known for its powerful and efficient aggregations, making it a preferred choice for analyzing high-dimensional and time-based data. It can perform complex roll-up, slicing-and-dicing, and grouping operations on large datasets, providing quick insights into time-series data. Elasticsearch also supports aggregations but may face performance limitations when dealing with large datasets or complex aggregations.
Real-Time Analytics vs. Real-Time Search: While both Druid and Elasticsearch provide real-time capabilities, they focus on different aspects of real-time data processing. Druid is optimized for real-time analytics and exploratory data analysis, offering fast query response times and support for complex analytical queries. Elasticsearch, on the other hand, excels in real-time search scenarios, allowing users to perform fast and accurate full-text searches on large, constantly changing datasets.
Use Cases: Due to their differences in data model and capabilities, Druid and Elasticsearch cater to different use cases. Druid is commonly used for operational analytics, time-series analysis, and real-time monitoring, making it well-suited for applications in the IoT, ad tech, and log analytics domains. Elasticsearch, on the other hand, finds applications in search and recommendation engines, log analysis, e-commerce search, and content management systems.

Druid vs Elasticsearch

Overview