Skip to content

Getting Started with Apache Cassandra

Getting Cassandra running takes about five minutes with Docker—the real learning curve is understanding how to use it effectively. Cassandra is not a drop-in replacement for PostgreSQL or MySQL; it requires a different mental model.

The core difference: in relational databases, the schema is designed first and queried as needed. In Cassandra, tables are designed around specific queries. This means denormalized tables, no JOINs, and knowing access patterns upfront. This is a trade-off—query flexibility is exchanged for predictable performance at any scale.

This guide walks through installation, initial configuration, and initial queries.

Learning Path

uml diagram

Quick Start Options

Choose the appropriate path:

5-Minute Quick Start (Docker)

Get Cassandra running in 5 minutes for development:

# Pull and run Cassandra
docker run --name cassandra -d -p 9042:9042 cassandra:latest

# Wait for startup (about 60 seconds)
sleep 60

# Connect with cqlsh
docker exec -it cassandra cqlsh

Production Installation

For production deployments, follow our detailed guides:

  1. Installation Overview - Various installation methods
  2. Cloud Deployment - AWS, GCP, Azure

Getting Started Guides

1. Understanding Cassandra

  • What is Cassandra?
  • Core concepts and terminology
  • When to use Cassandra
  • Cassandra vs other databases

2. Installation

3. First Cluster Setup

  • First Cluster Setup
  • Single-node development setup
  • Multi-node cluster configuration
  • Basic configuration options

4. CQL Quickstart

  • CQL Tutorial
  • Connect to Cassandra
  • Create keyspaces and tables
  • Insert and query data
  • Basic operations

5. Connect the Application

6. Production Readiness

  • Production Checklist
  • Hardware requirements
  • Configuration recommendations
  • Security setup
  • Monitoring setup

Essential Concepts

Before diving in, familiarize yourself with these key concepts:

Concept Description
Cluster A collection of nodes that together store data
Node A single Cassandra server instance
Keyspace A namespace that defines data replication (like a database)
Table A collection of rows with a defined schema
Partition Key Determines which node stores the data
Clustering Column Determines sort order within a partition
Replication Factor Number of copies of data stored across nodes
Consistency Level How many replicas must respond for a successful operation

Example: First 10 Minutes

Here is what can be accomplished in the first 10 minutes:

1. Start Cassandra (2 minutes)

# Using Docker
docker run --name my-cassandra -d -p 9042:9042 cassandra:5.0

2. Connect (1 minute)

# Wait for startup, then connect
docker exec -it my-cassandra cqlsh

3. Create a Keyspace (1 minute)

CREATE KEYSPACE my_app WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor': 1
};

USE my_app;

4. Create a Table (2 minutes)

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT,
    created_at TIMESTAMP
);

5. Insert Data (2 minutes)

INSERT INTO users (user_id, username, email, created_at)
VALUES (uuid(), 'john_doe', '[email protected]', toTimestamp(now()));

INSERT INTO users (user_id, username, email, created_at)
VALUES (uuid(), 'jane_smith', '[email protected]', toTimestamp(now()));

6. Query Data (2 minutes)

-- Get all users
SELECT * FROM users;

-- Get specific columns
SELECT username, email FROM users;

Common Questions

What is the minimum hardware for development?

For development, Cassandra can run with: - 2 CPU cores - 4GB RAM (8GB recommended) - 10GB disk space

Should I use SimpleStrategy or NetworkTopologyStrategy?

  • SimpleStrategy: Only for single-datacenter, development/testing
  • NetworkTopologyStrategy: Always use for production, even single DC

What is the difference between cqlsh and CQLAI?

Feature cqlsh CQLAI
Language Python Go (single binary)
AI Query Generation No Yes
Tab Completion Basic Context-aware
Output Formats Basic Table, JSON, CSV, Parquet
Dependencies Python required None

Learn more about CQLAI

Next Steps

After getting started:

  1. Learn Data Modeling - Design effective schemas
  2. Understand Architecture - How Cassandra works
  3. Explore CQL - Full query language reference
  4. Set Up Monitoring - Monitor the cluster

Getting Help