Cassandra logo

Cassandra

A partitioned row store. Rows are organized into tables with a required primary key.
3.4K
3.4K
+ 1
500

What is Cassandra?

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
Cassandra is a tool in the Databases category of a tech stack.
Cassandra is an open source tool with 7.7K GitHub stars and 3.3K GitHub forks. Here’s a link to Cassandra's open source repository on GitHub

Who uses Cassandra?

Companies
516 companies reportedly use Cassandra in their tech stacks, including Uber, Facebook, and Netflix.

Developers
2739 developers on StackShare have stated that they use Cassandra.

Cassandra Integrations

Datadog, Kong, DataGrip, Liquibase, and Redash are some of the popular tools that integrate with Cassandra. Here's a list of all 55 tools that integrate with Cassandra.
Pros of Cassandra
116
Distributed
97
High performance
81
High availability
74
Easy scalability
52
Replication
26
Reliable
26
Multi datacenter deployments
9
OLTP
8
Schema optional
8
Open source
2
Workload separation (via MDC)
1
Fast
Decisions about Cassandra

Here are some stack decisions, common use cases and reviews by companies and developers who chose Cassandra in their tech stack.

Bhanu Sai Pavan Kumar Pothuri
Developer at SAP Labs India Private Ltd · | 2 upvotes · 10.4K views

I was reading instagram system design question

He said he will be using Cassandra for tables storing user, photo metadata, user follows tables... I didn't understand why he is using Cassandra and why not regular rdms like PostgreSQL or nosql like MongoDB.

See more

Why should I consider DataStax Enterprise instead of vanilla Cassandra?

See more
Umair Iftikhar
Technical Architect at ERP Studio · | 3 upvotes · 299.7K views

Developing a solution that collects Telemetry Data from different devices, nearly 1000 devices minimum and maximum 12000. Each device is sending 2 packets in 1 second. This is time-series data, and this data definition and different reports are saved on PostgreSQL. Like Building information, maintenance records, etc. I want to know about the best solution. This data is required for Math and ML to run different algorithms. Also, data is raw without definitions and information stored in PostgreSQL. Initially, I went with TimescaleDB due to PostgreSQL support, but to increase in sites, I started facing many issues with timescale DB in terms of flexibility of storing data.

My major requirement is also the replication of the database for reporting and different purposes. You may also suggest other options other than Druid and Cassandra. But an open source solution is appreciated.

See more

Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

  1. Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
  2. Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
  3. Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
  4. Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
  5. Processing-> We want to use SAS if at all possible. What will work with SAS code?
  6. Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
  7. I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
  8. An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, JavaScript, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!
See more

Blog Posts

Jobs that mention Cassandra as a desired skillset

CBRE
United States of America Texas Richardson
CBRE
United States of America Texas Richardson
CBRE
United States of America Texas Richardson
CBRE
United States of America Texas Richardson
CBRE
United States of America Texas Richardson
See all jobs

Cassandra Alternatives & Comparisons

What are some alternatives to Cassandra?
HBase
Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
Google Cloud Bigtable
Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years—it's the database driving major applications such as Google Analytics and Gmail.
Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Redis
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams.
Couchbase
Developed as an alternative to traditionally inflexible SQL databases, the Couchbase NoSQL database is built on an open source foundation and architected to help developers solve real-world problems and meet high scalability demands.
See all alternatives

Cassandra's Followers
3363 developers follow Cassandra to keep up with related blogs and decisions.