Skip to content

Apache Cassandra® Documentation

Production-grade reference for architecture, CQL, and operations.

Documentation Scope

This reference documentation covers Apache Cassandra versions 4.0 through 5.x, with emphasis on production deployments. Cassandra 5.0 (September 2024) introduced major features including Storage-Attached Indexes (SAI), Vector Search, and Unified Compaction Strategy (UCS).

Version Range Java Requirement Documentation Status
3.11.x Java 8 Legacy reference
4.0.x Java 8/11 Supported
4.1.x Java 8/11 Fully Documented
5.0.x Java 11/17 Current (5.0.6)

Legend: ✅ Production Ready | ⚠️ Limited/Deprecated | ❌ Not Supported


What's New

Cassandra 5.0.6 (October 2025) - Current Release

  • Bug fixes and stability improvements

Cassandra 5.0.0 (September 2024) - Major Release

  • Storage-Attached Indexes (SAI) (CEP-7) - Efficient secondary indexing within storage layer
  • Vector data type and search (CEP-30) - Approximate nearest neighbor searching via SAI
  • Unified Compaction Strategy (UCS) (CEP-26) - Adaptive compaction replacing multiple strategies
  • Trie memtables (CEP-19) - Trie-based in-memory data structures
  • Trie SSTables (CEP-25) - Trie-indexed SSTable format
  • Dynamic Data Masking (CEP-20) - Selective redaction of sensitive data at query time
  • Java 17 support - recommended for Cassandra 5.0
  • TTL and writetime on collections/UDTs - Extended metadata for complex types
  • CIDR-based authorizer (CEP-33) - Network-based access control
  • New math functions: abs, exp, log, log10, round

Cassandra 4.1.10 (September 2025)

  • Bug fixes and stability improvements

Cassandra 4.1.0 (December 2022)

  • Paxos v2 - Enhanced lightweight transaction protocol
  • Guardrails - Operational safety boundaries and limits
  • Partition denylist - Block access to problematic partitions
  • Top partition tracking - Per-table monitoring of hot partitions
  • Native transport rate limiting - Request throughput controls
  • Client-side password hashing - Enhanced authentication security
  • Pluggable memtables - Custom memtable implementations

Cassandra 4.0.19 (October 2025)

  • Bug fixes and stability improvements

Cassandra 4.0.0 (July 2021)

  • Virtual tables - System information via CQL queries
  • Audit logging - Comprehensive query audit trail
  • Full query logging - Capture all queries for replay
  • Incremental repair improvements - More efficient anti-entropy
  • Zero-copy streaming - Faster data transfer between nodes
  • Java 11 support - Modern JVM compatibility

Apache Cassandra is a widely adopted distributed database, but much of its operational and architectural knowledge has historically lived in mailing lists, conference talks, and tribal knowledge rather than formal documentation.

This documentation provides a comprehensive, production-focused reference for Apache Cassandra, covering storage engine internals, compaction strategies, indexing, CQL semantics, data modeling, and operational tooling. Content is designed for developers, operators, and architects building and maintaining Cassandra deployments at scale.

This documentation complements the Official Apache Cassandra Documentation, providing deeper explanations of behavioral contracts, failure semantics, and practical guidance for real-world deployments.


Apache Cassandra is a distributed NoSQL database designed for extreme scale, exceptional performance, and continuous availability. There is no master node—every node can handle reads and writes, so the failure of any single node (or even an entire datacenter) does not take down the database.

Cassandra excels at write-heavy workloads, time-series data, and applications requiring geographic distribution. Cassandra is less suited for complex queries, ad-hoc analytics, or workloads requiring strong consistency with frequent cross-partition transactions.


About This Documentation

This documentation serves as a comprehensive reference for Apache Cassandra, covering architecture, configuration, operations, data modeling, CQL, and troubleshooting. The goal is to provide complete, accurate, and practical guidance for developers, operators, and architects working with Cassandra in production environments.

Principle Description
Source Code Verified Configuration options, default values, and behavior are cross-referenced against the Cassandra source code to ensure accuracy
CEP Aligned New features reference their corresponding Cassandra Enhancement Proposals (CEPs) for design rationale and implementation details
Version Aware Documentation notes version-specific differences between Cassandra 4.x and 5.x releases
Operationally Focused Content prioritizes practical operational guidance derived from production experience

Topics are organized for both learning and reference. New users can follow the Getting Started guides sequentially, while experienced operators can use the detailed reference sections for specific configuration options, JMX metrics, and operational procedures.


What is Apache Cassandra?

History and Origins

Cassandra was created at Facebook in 2007 by Avinash Lakshman and Prashant Malik to power Facebook's Inbox Search feature—a system requiring high write throughput across hundreds of millions of users with strict latency requirements. Lakshman, a co-author of Amazon's Dynamo paper, brought distributed systems expertise that shaped Cassandra's architecture.

Year Milestone
2007 Development begins at Facebook
2008 Open sourced under Apache License 2.0 (July)
2009 Enters Apache Incubator (March)
2010 Graduates to Apache Top-Level Project (February)
2011 Cassandra 1.0 released
2014 Cassandra 2.0 introduces lightweight transactions
2016 Cassandra 3.0 brings materialized views and SASI
2021 Cassandra 4.0 after extensive testing focus
2024 Cassandra 5.0 introduces vectors, SAI, and UCS

The project is licensed under the Apache License 2.0, permitting commercial use, modification, and distribution.

Design Influences

Cassandra's design draws from two foundational distributed systems papers: Google's BigTable (2006) provided the storage model—SSTables, memtables, and the LSM-tree architecture. Amazon's Dynamo (2007) provided the distribution model—consistent hashing, gossip-based cluster membership, and tunable consistency levels.

Performance Characteristics

Cassandra delivers exceptional performance at scale:

Metric Typical Performance Notes
Write Throughput 100,000+ writes/sec per node Sequential I/O to commit log; parallel memtable inserts
Read Latency (P99) 1-5 ms With proper data modeling and warm caches
Write Latency (P99) 1-2 ms Commit log append + memtable insert
Scalability Linear to 1000+ nodes Proven in production at petabyte scale

Performance derives from Cassandra's architecture:

  • Log-structured writes: All writes append sequentially to the commit log, avoiding random disk seeks
  • Memtable buffering: Recent writes held in memtables before flushing to disk
  • Parallel execution: Requests distributed across nodes; no single bottleneck
  • Token-aware routing: Drivers send requests directly to replica nodes, avoiding extra network hops

Fault Tolerance

Cassandra is designed to survive failures at every level:

Failure Scenario Cassandra Behavior
Single node failure Remaining replicas serve requests; hinted handoff queues writes for recovery
Rack failure Rack-aware replication ensures replicas exist in other racks
Datacenter failure Multi-DC replication provides geographic redundancy; traffic fails over automatically
Network partition Nodes continue serving requests independently; reconciliation occurs on recovery

Unlike primary-replica databases that fail over to a standby, Cassandra has no failover—all nodes are active and capable of serving any request. This eliminates failover latency and split-brain scenarios.

Key Features

Feature Description
Distributed Architecture Data is automatically distributed across multiple nodes
Linear Scalability Add capacity by adding nodes with no downtime
High Availability No single point of failure; survives node and datacenter failures
Tunable Consistency Choose consistency level per operation
Multi-Datacenter Replication Built-in support for geographically distributed clusters
Flexible Schema Wide-column store with support for complex data types

Common Misconceptions

Understanding what Cassandra is not helps set appropriate expectations.

Misconception Reality
"Cassandra is eventually consistent" Cassandra offers tunable consistency. With QUORUM reads and writes, strong consistency is achieved. "Eventually consistent" only applies when using weaker consistency levels like ONE.
"Cassandra doesn't support transactions" Cassandra supports lightweight transactions (LWT) using Paxos for compare-and-set operations. Accord, a general-purpose distributed transaction protocol, is under active development for a future release. LWT provides linearizable consistency for specific use cases, though not ACID transactions across arbitrary rows.
"Cassandra can't do joins" Correct—by design. Cassandra optimizes for fast reads at scale by denormalizing data. Model data according to query patterns rather than normalizing and joining at read time.
"Cassandra is only for write-heavy workloads" Cassandra handles read-heavy workloads effectively when data is modeled correctly. The key is designing tables around query patterns, not write patterns.
"Cassandra requires expensive hardware" Cassandra runs effectively on both commodity hardware and high-end servers. Modern Cassandra scales well both horizontally (adding nodes) and vertically (larger instances with more CPU cores and memory).
"Cassandra is hard to operate" Modern tooling such as AxonOps automates most operational tasks. The learning curve exists, but operational complexity is manageable with proper tooling and training.
"Data modeling is too difficult" Query-first modeling is different from relational modeling, not harder. Once the principles are understood (partition keys, clustering columns, denormalization), modeling becomes straightforward. Tools like AxonOps Workbench provide visual data modeling assistance.
"Cassandra loses data" Data loss occurs from misconfiguration (improper gc_grace_seconds, skipped repairs) or hardware failures beyond the replication factor—not from Cassandra itself. With proper operations, Cassandra provides strong durability guarantees.
"Cassandra is an in-memory database" Cassandra is a persistent, disk-based database. While memtables buffer recent writes in memory, all data is durably written to the commit log immediately and flushed to SSTables on disk. Memory caches improve read performance but are not the primary storage.

Getting Started

New to Cassandra? Begin with installation and initial configuration.

  • Installation


    Install Cassandra on Linux, Docker, or Kubernetes environments.

    Installation Guide

  • First Cluster


    Create and configure a first Cassandra cluster step by step.

    First Cluster

  • Client Drivers


    Connect applications using Java, Python, Go, and other drivers.

    Driver Setup

  • CQL Quickstart


    Learn Cassandra Query Language basics with hands-on examples.

    CQL Quickstart


Architecture

Understand Cassandra's distributed architecture and storage engine.

  • Architecture Overview


    Distributed architecture fundamentals, gossip protocol, and cluster topology.

    Architecture Overview

  • Data Distribution


    Partitioning, token rings, and virtual nodes (vnodes) explained.

    Data Distribution

  • Replication


    Replication strategies, consistency levels, and fault tolerance.

    Replication

  • Storage Engine


    Memtables, SSTables, commit log, and write path internals.

    Storage Engine

  • Compaction


    STCS, LCS, TWCS, and UCS compaction strategies explained.

    Compaction Strategies


CQL Reference

Complete Cassandra Query Language documentation.

  • CQL Overview


    CQL language reference and query syntax fundamentals.

    CQL Overview

  • Data Types


    Native, collection, and user-defined types reference.

    Data Types

  • DDL Commands


    CREATE, ALTER, DROP statements for schema management.

    DDL Commands

  • DML Commands


    SELECT, INSERT, UPDATE, DELETE for data manipulation.

    DML Commands

  • Indexing


    Secondary indexes, SASI, and Storage-Attached Indexing (SAI).

    Indexing

  • Functions


    Built-in and user-defined functions reference.

    Functions


Data Modeling

Design effective Cassandra data models.

  • Data Modeling Guide


    Query-first design methodology and denormalization patterns.

    Data Modeling Guide

  • Key Concepts


    Partition keys, clustering columns, and primary key design.

    Key Concepts

  • Anti-Patterns


    Common data modeling mistakes and how to avoid them.

    Anti-Patterns


Operations

Production deployment, monitoring, and maintenance procedures.

  • Cluster Management


    Add, remove, replace, and decommission nodes safely.

    Cluster Management

  • Backup & Restore


    Snapshots, incremental backups, and disaster recovery.

    Backup & Restore

  • Repair


    Anti-entropy repair to maintain data consistency.

    Repair

  • Configuration


    cassandra.yaml, JVM options, and snitch configuration.

    Configuration

  • Maintenance


    Routine maintenance tasks and operational procedures.

    Maintenance


Monitoring & Performance

Monitor clusters and optimize performance.

  • Monitoring


    JMX metrics, key metrics to track, and alerting strategies.

    Monitoring Guide

  • JMX Reference


    500+ metrics with thresholds and 30 MBeans documented.

    JMX Reference

  • Performance Tuning


    Hardware sizing, JVM tuning, and OS optimization.

    Performance Guide

  • Query Optimization


    Write efficient queries and avoid performance pitfalls.

    Query Optimization


Security

Authentication, authorization, and encryption for Cassandra deployments.

  • Authentication


    Internal authentication, LDAP integration, and Kerberos.

    Authentication

  • Authorization


    Role-based access control and permission management.

    Authorization

  • Encryption


    TLS for client and internode encryption, encryption at rest.

    Encryption


Tools

Essential Cassandra command-line and administration tools.

  • nodetool


    Cluster management commands for operations and diagnostics.

    nodetool Reference

  • cqlsh


    Interactive CQL shell for queries and schema management.

    cqlsh Reference

  • CQLAI


    Modern AI-powered CQL shell with intelligent assistance.

    CQLAI

  • cassandra-stress


    Load testing and benchmarking tool for Cassandra.

    cassandra-stress


Troubleshooting

Diagnostic procedures and solutions for common issues.

  • Diagnosis


    Root cause analysis procedures and diagnostic workflows.

    Diagnosis Guide

  • Log Analysis


    Interpreting logs, log patterns, and log configuration.

    Log Analysis

  • Common Errors


    ReadTimeout, WriteTimeout, and other common errors explained.

    Troubleshooting Guide


Quick Reference

  • Reference


    Quick reference for configuration, metrics, and commands.

    Reference


By Experience Level

Beginners: InstallationFirst ClusterCQL Quickstart

Developers: Data ModelingCQL ReferenceDrivers

Operators: OperationsMonitoringTroubleshooting

Performance Engineers: JMX MetricsPerformance TuningBenchmarking

Common Tasks

Task Documentation
Install Cassandra Installation Guide
Design a data model Data Modeling Guide
Fix timeout errors ReadTimeoutException
Manage cluster nodes Cluster Management
Configure backups Backup Guide
Monitor the cluster Monitoring Guide
Tune performance Performance Guide

Version Compatibility

Supported Versions

Version Release Date End of Support Status
5.0.x September 2024 Until 5.3.0 release Current
4.1.x December 2022 Until 5.2.0 release Supported
4.0.x July 2021 Until 5.1.0 release Supported
3.11.x June 2017 Unmaintained Legacy

Upgrade Path

Direct upgrades skipping major versions are not supported. To upgrade from 3.11.x to 5.0.x:

  1. Upgrade 3.11.x → 4.0.x
  2. Upgrade 4.0.x → 4.1.x
  3. Upgrade 4.1.x → 5.0.x

Documentation Conventions

This documentation uses RFC 2119 terminology (must, should, may) to indicate requirement levels. Version-specific behaviors are explicitly noted with the applicable Cassandra version range.


Contributing

This documentation is maintained by AxonOps. Found an error or want to contribute? Visit the GitHub repository.