Cassandra vs Vertica

Overview

Cassandra

Stacks3.6K

Followers3.5K

Votes507

GitHub Stars9.5K

Forks3.8K

Vertica

Stacks90

Followers120

Votes16

Cassandra vs Vertica: What are the differences?

Introduction

Cassandra and Vertica are both popular database management systems used for different purposes. However, they have key differences that set them apart from each other. This article aims to provide a concise overview of these differences.

Data Model: One significant difference between Cassandra and Vertica is their data model. Cassandra is a NoSQL database that follows a wide-column data model. It is designed for scalability and high availability, allowing for massive amounts of structured, semi-structured, and unstructured data. On the other hand, Vertica is a traditional relational database that follows a columnar data model, which optimizes query performance for analytical workloads.
Scalability: Cassandra and Vertica handle scalability differently. Cassandra is known for its linear scalability, allowing it to handle large amounts of data across nodes in a distributed environment. It achieves this through its masterless architecture, where there is no central coordinator that becomes a bottleneck. In contrast, Vertica has a shared-nothing architecture, which utilizes a cluster of interconnected nodes that distribute and parallelize the workload for high-performance analytics.
Consistency Model: Another key difference lies in the consistency model offered by Cassandra and Vertica. Cassandra follows a tunable consistency model, providing flexibility in balancing consistency and availability. It offers consistency levels ranging from strong consistency (quorum-based) to eventual consistency. On the contrary, Vertica ensures strong consistency within a single transaction, maintaining ACID (Atomicity, Consistency, Isolation, Durability) properties traditionally associated with relational databases.
Query Language: Cassandra and Vertica have different query languages. Cassandra uses Cassandra Query Language (CQL), which is similar to traditional SQL but specifically tailored for the Cassandra database. It supports CQL version 3, providing features like flexible data types, lightweight transactions, and secondary indexes. Vertica, being a relational database, supports SQL for querying and manipulating data, with additional optimizations for analytics and data processing.
Workload Types: Cassandra and Vertica are optimized for different workload types. Cassandra is designed for high write throughput and can handle real-time applications that require low-latency data access. It excels in use cases such as time-series data, IoT (Internet of Things) data, and high-volume event logging. On the other hand, Vertica is built for analytics workloads and is often used for business intelligence, data warehousing, and advanced analytics tasks that involve complex queries and aggregations on large datasets.
Data Replication: Cassandra and Vertica have different approaches to data replication. Cassandra utilizes a distributed architecture with peer-to-peer replication, ensuring high availability and fault tolerance. It replicates data across multiple nodes using techniques like virtual nodes and consistent hashing. In contrast, Vertica supports data replication through its own replication strategy, where it replicates data to multiple nodes for redundancy and disaster recovery.

In summary, Cassandra and Vertica differ in their data models, scalability approaches, consistency models, query languages, workload optimizations, and data replication strategies. These differences make them suitable for different use cases and highlight the specific strengths of each database management system.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Cassandra, Vertica

Vinay

Head of Engineering

Sep 19, 2019

Needs advice

The problem I have is - we need to process & change(update/insert) 55M Data every 2 min and this updated data to be available for Rest API for Filtering / Selection. Response time for Rest API should be less than 1 sec.

The most important factors for me are processing and storing time of 2 min. There need to be 2 views of Data One is for Selection & 2. Changed data.

174k views174k

Comments

Detailed Comparison

Cassandra	Vertica
Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.	It provides a best-in-class, unified analytics platform that will forever be independent from underlying infrastructure.
-	Analyze All of Your Data. No longer move data or settle for siloed views;Achieve Scale and Performance;Fear of growing data volumes and users is a thing of the past;Future-Proof Your Analytics
Statistics
GitHub Stars 9.5K	GitHub Stars -
GitHub Forks 3.8K	GitHub Forks -
Stacks 3.6K	Stacks 90
Followers 3.5K	Followers 120
Votes 507	Votes 16
Pros & Cons
Pros 119 Distributed 98 High performance 81 High availability 74 Easy scalability 53 Replication Cons 3 Reliability of replication 1 Size 1 Updates	Pros 3 Shared nothing or shared everything architecture 1 Partition pruning and predicate push down on Parquet 1 Vertica is the only product which offers partition prun 1 Query-Optimized Storage 1 Fully automated Database Designer tool
Integrations
No integrations available	Oracle Golang MongoDB MySQL Sass Mode PowerBI Tableau Talend

What are some alternatives to Cassandra, Vertica?

MongoDB

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

MySQL

The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

PostgreSQL

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

Microsoft SQL Server

Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

SQLite

SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

MariaDB

Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry. MariaDB is designed as a drop-in replacement of MySQL(R) with more features, new storage engines, fewer bugs, and better performance.

RethinkDB

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

ArangoDB

A distributed free and open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

InfluxDB

InfluxDB is a scalable datastore for metrics, events, and real-time analytics. It has a built-in HTTP API so you don't have to write any server side code to get up and running. InfluxDB is designed to be scalable, simple to install and manage, and fast to get data in and out.

Related Comparisons

Cassandra vs Vertica: What are the differences?

Introduction

Data Model: One significant difference between Cassandra and Vertica is their data model. Cassandra is a NoSQL database that follows a wide-column data model. It is designed for scalability and high availability, allowing for massive amounts of structured, semi-structured, and unstructured data. On the other hand, Vertica is a traditional relational database that follows a columnar data model, which optimizes query performance for analytical workloads.
Scalability: Cassandra and Vertica handle scalability differently. Cassandra is known for its linear scalability, allowing it to handle large amounts of data across nodes in a distributed environment. It achieves this through its masterless architecture, where there is no central coordinator that becomes a bottleneck. In contrast, Vertica has a shared-nothing architecture, which utilizes a cluster of interconnected nodes that distribute and parallelize the workload for high-performance analytics.
Consistency Model: Another key difference lies in the consistency model offered by Cassandra and Vertica. Cassandra follows a tunable consistency model, providing flexibility in balancing consistency and availability. It offers consistency levels ranging from strong consistency (quorum-based) to eventual consistency. On the contrary, Vertica ensures strong consistency within a single transaction, maintaining ACID (Atomicity, Consistency, Isolation, Durability) properties traditionally associated with relational databases.
Query Language: Cassandra and Vertica have different query languages. Cassandra uses Cassandra Query Language (CQL), which is similar to traditional SQL but specifically tailored for the Cassandra database. It supports CQL version 3, providing features like flexible data types, lightweight transactions, and secondary indexes. Vertica, being a relational database, supports SQL for querying and manipulating data, with additional optimizations for analytics and data processing.
Workload Types: Cassandra and Vertica are optimized for different workload types. Cassandra is designed for high write throughput and can handle real-time applications that require low-latency data access. It excels in use cases such as time-series data, IoT (Internet of Things) data, and high-volume event logging. On the other hand, Vertica is built for analytics workloads and is often used for business intelligence, data warehousing, and advanced analytics tasks that involve complex queries and aggregations on large datasets.
Data Replication: Cassandra and Vertica have different approaches to data replication. Cassandra utilizes a distributed architecture with peer-to-peer replication, ensuring high availability and fault tolerance. It replicates data across multiple nodes using techniques like virtual nodes and consistent hashing. In contrast, Vertica supports data replication through its own replication strategy, where it replicates data to multiple nodes for redundancy and disaster recovery.

Cassandra vs Vertica

Overview

Cassandra vs Vertica: What are the differences?