Need advice about which tool to choose?Ask the StackShare community!

Clickhouse

387
517
+ 1
78
InfluxDB

1K
1.2K
+ 1
174
Add tool

Clickhouse vs InfluxDB: What are the differences?

Clickhouse and InfluxDB are both popular time-series databases used for storing and analyzing time-series data. Let's explore the key differences between them.

  1. Data Model: Clickhouse uses a columnar data model, which stores data in columns rather than rows. This allows for efficient compression and faster query performance, especially for analytical workloads. InfluxDB, on the other hand, uses a tag-value data model, which stores data in the form of key-value pairs. This model is optimized for storing and retrieving time-series data, making InfluxDB a preferred choice for real-time monitoring and IoT applications.

  2. Query Language: Clickhouse uses an SQL-like query language called ClickHouse SQL, which provides a wide range of expressive analytical functions and supports complex queries for data analysis. InfluxDB, on the other hand, uses its own query language called InfluxQL, which is specifically designed for time-series data. InfluxQL offers a simplified syntax and focuses on retrieving and manipulating time-series data efficiently.

  3. Architecture: Clickhouse is a distributed columnar database with a shared-nothing architecture, allowing for horizontal scalability and high availability. It can handle massive amounts of data and perform distributed queries across multiple nodes. InfluxDB, on the other hand, is a single-node database by default, but it also offers clustering capabilities for high availability and scalability. The architecture of InfluxDB is optimized for real-time data ingestion and query performance on single nodes or small clusters.

  4. Data Ingestion: Clickhouse provides a variety of ingestion methods, including batch ingestion, streaming ingestion, and data replication. It supports various formats like CSV, JSON, and Apache Kafka for data ingestion. InfluxDB, on the other hand, excels in real-time data ingestion and provides a built-in HTTP API and line protocol for data ingestion. It also has integrations with popular monitoring and IoT platforms, making it easier to ingest data from various sources.

  5. Data Processing: Clickhouse is primarily designed for offline data analytics and supports complex analytical queries like window functions, subqueries, and joins. It also provides support for materialized views and data aggregation. InfluxDB, on the other hand, is focused on real-time data processing and supports continuous queries, downsampling, and data retention policies. It also offers built-in functions for anomaly detection and data downsampling.

  6. Ecosystem and Integrations: Clickhouse has a growing ecosystem of integrations with popular data processing frameworks like Apache Spark and Apache Hadoop. It also provides drivers for different programming languages like Python, Java, and Go. InfluxDB, on the other hand, has a strong ecosystem of integrations with monitoring and visualization tools like Grafana and Prometheus. It also provides libraries and clients for various programming languages, making it easy to integrate with existing workflows.

In summary, Clickhouse excels in offline analytical workloads, while InfluxDB is optimized for real-time monitoring and IoT applications.

Advice on Clickhouse and InfluxDB
Needs advice
on
InfluxDBInfluxDBMongoDBMongoDB
and
TimescaleDBTimescaleDB

We are building an IOT service with heavy write throughput and fewer reads (we need downsampling records). We prefer to have good reliability when comes to data and prefer to have data retention based on policies.

So, we are looking for what is the best underlying DB for ingesting a lot of data and do queries easily

See more
Replies (3)
Yaron Lavi
Recommends
on
PostgreSQLPostgreSQL

We had a similar challenge. We started with DynamoDB, Timescale, and even InfluxDB and Mongo - to eventually settle with PostgreSQL. Assuming the inbound data pipeline in queued (for example, Kinesis/Kafka -> S3 -> and some Lambda functions), PostgreSQL gave us a We had a similar challenge. We started with DynamoDB, Timescale and even InfluxDB and Mongo - to eventually settle with PostgreSQL. Assuming the inbound data pipeline in queued (for example, Kinesis/Kafka -> S3 -> and some Lambda functions), PostgreSQL gave us better performance by far.

See more
Recommends
on
DruidDruid

Druid is amazing for this use case and is a cloud-native solution that can be deployed on any cloud infrastructure or on Kubernetes. - Easy to scale horizontally - Column Oriented Database - SQL to query data - Streaming and Batch Ingestion - Native search indexes It has feature to work as TimeSeriesDB, Datawarehouse, and has Time-optimized partitioning.

See more
Ankit Malik
Software Developer at CloudCover · | 3 upvotes · 321.4K views
Recommends
on
Google BigQueryGoogle BigQuery

if you want to find a serverless solution with capability of a lot of storage and SQL kind of capability then google bigquery is the best solution for that.

See more
Decisions about Clickhouse and InfluxDB
Benoit Larroque
Principal Engineer at Sqreen · | 2 upvotes · 133.4K views

I chose TimescaleDB because to be the backend system of our production monitoring system. We needed to be able to keep track of multiple high cardinality dimensions.

The drawbacks of this decision are our monitoring system is a bit more ad hoc than it used to (New Relic Insights)

We are combining this with Grafana for display and Telegraf for data collection

See more
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of Clickhouse
Pros of InfluxDB
  • 19
    Fast, very very fast
  • 11
    Good compression ratio
  • 6
    Horizontally scalable
  • 5
    Great CLI
  • 5
    Utilizes all CPU resources
  • 5
    RESTful
  • 4
    Buggy
  • 4
    Open-source
  • 4
    Great number of SQL functions
  • 3
    Server crashes its normal :(
  • 3
    Has no transactions
  • 2
    Flexible connection options
  • 2
    Highly available
  • 2
    ODBC
  • 2
    Flexible compression options
  • 1
    In IDEA data import via HTTP interface not working
  • 58
    Time-series data analysis
  • 30
    Easy setup, no dependencies
  • 24
    Fast, scalable & open source
  • 21
    Open source
  • 20
    Real-time analytics
  • 6
    Continuous Query support
  • 5
    Easy Query Language
  • 4
    HTTP API
  • 4
    Out-of-the-box, automatic Retention Policy
  • 1
    Offers Enterprise version
  • 1
    Free Open Source version

Sign up to add or upvote prosMake informed product decisions

Cons of Clickhouse
Cons of InfluxDB
  • 5
    Slow insert operations
  • 4
    Instability
  • 1
    Proprietary query language
  • 1
    HA or Clustering is only in paid version

Sign up to add or upvote consMake informed product decisions

What is Clickhouse?

It allows analysis of data that is updated in real time. It offers instant results in most cases: the data is processed faster than it takes to create a query.

What is InfluxDB?

InfluxDB is a scalable datastore for metrics, events, and real-time analytics. It has a built-in HTTP API so you don't have to write any server side code to get up and running. InfluxDB is designed to be scalable, simple to install and manage, and fast to get data in and out.

Need advice about which tool to choose?Ask the StackShare community!

What companies use Clickhouse?
What companies use InfluxDB?
See which teams inside your own company are using Clickhouse or InfluxDB.
Sign up for StackShare EnterpriseLearn More

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with Clickhouse?
What tools integrate with InfluxDB?

Sign up to get full access to all the tool integrationsMake informed product decisions

Blog Posts

What are some alternatives to Clickhouse and InfluxDB?
Cassandra
Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
MySQL
The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.
Druid
Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.
MongoDB
MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
See all alternatives