Need advice about which tool to choose?Ask the StackShare community!
Druid vs Pig: What are the differences?
What is Druid? Fast column-oriented distributed data store. Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.
What is Pig? Platform for analyzing large data sets. Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce. .
Druid and Pig can be categorized as "Big Data" tools.
Druid and Pig are both open source tools. It seems that Druid with 8.31K GitHub stars and 2.08K forks on GitHub has more adoption than Pig with 583 GitHub stars and 449 GitHub forks.
Airbnb, Instacart, and Dial Once are some of the popular companies that use Druid, whereas Pig is used by Netflix, Outbrain, and Cobrain. Druid has a broader approval, being mentioned in 24 company stacks & 12 developers stacks; compared to Pig, which is listed in 9 company stacks and 4 developer stacks.
Pros of Druid
- Real Time Aggregations15
- Batch and Real-Time Ingestion6
- OLAP5
- OLAP + OLTP3
- Combining stream and historical analytics2
- OLTP1
Pros of Pig
- Finer-grained control on parallelization2
- Proven at Petabyte scale1
- Open-source1
- Join optimizations for highly skewed data1
Sign up to add or upvote prosMake informed product decisions
Cons of Druid
- Limited sql support3
- Joins are not supported well2
- Complexity1