Getting Started with Apache Cassandra¶
Getting Cassandra running takes about five minutes with Docker—the real learning curve is understanding how to use it effectively. Cassandra is not a drop-in replacement for PostgreSQL or MySQL; it requires a different mental model.
The core difference: in relational databases, the schema is designed first and queried as needed. In Cassandra, tables are designed around specific queries. This means denormalized tables, no JOINs, and knowing access patterns upfront. This is a trade-off—query flexibility is exchanged for predictable performance at any scale.
This guide walks through installation, initial configuration, and initial queries.
Learning Path¶
Quick Start Options¶
Choose the appropriate path:
5-Minute Quick Start (Docker)¶
Get Cassandra running in 5 minutes for development:
# Pull and run Cassandra
docker run --name cassandra -d -p 9042:9042 cassandra:latest
# Wait for startup (about 60 seconds)
sleep 60
# Connect with cqlsh
docker exec -it cassandra cqlsh
Production Installation¶
For production deployments, follow our detailed guides:
- Installation Overview - Various installation methods
- Cloud Deployment - AWS, GCP, Azure
Getting Started Guides¶
1. Understanding Cassandra¶
- What is Cassandra?
- Core concepts and terminology
- When to use Cassandra
- Cassandra vs other databases
2. Installation¶
- Installation Overview - Various installation methods
- Cloud Providers - AWS, GCP, Azure
3. First Cluster Setup¶
- First Cluster Setup
- Single-node development setup
- Multi-node cluster configuration
- Basic configuration options
4. CQL Quickstart¶
- CQL Tutorial
- Connect to Cassandra
- Create keyspaces and tables
- Insert and query data
- Basic operations
5. Connect the Application¶
- Driver Setup - Client drivers for various languages
6. Production Readiness¶
- Production Checklist
- Hardware requirements
- Configuration recommendations
- Security setup
- Monitoring setup
Essential Concepts¶
Before diving in, familiarize yourself with these key concepts:
| Concept | Description |
|---|---|
| Cluster | A collection of nodes that together store data |
| Node | A single Cassandra server instance |
| Keyspace | A namespace that defines data replication (like a database) |
| Table | A collection of rows with a defined schema |
| Partition Key | Determines which node stores the data |
| Clustering Column | Determines sort order within a partition |
| Replication Factor | Number of copies of data stored across nodes |
| Consistency Level | How many replicas must respond for a successful operation |
Example: First 10 Minutes¶
Here is what can be accomplished in the first 10 minutes:
1. Start Cassandra (2 minutes)¶
# Using Docker
docker run --name my-cassandra -d -p 9042:9042 cassandra:5.0
2. Connect (1 minute)¶
# Wait for startup, then connect
docker exec -it my-cassandra cqlsh
3. Create a Keyspace (1 minute)¶
CREATE KEYSPACE my_app WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
USE my_app;
4. Create a Table (2 minutes)¶
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT,
created_at TIMESTAMP
);
5. Insert Data (2 minutes)¶
INSERT INTO users (user_id, username, email, created_at)
VALUES (uuid(), 'john_doe', '[email protected]', toTimestamp(now()));
INSERT INTO users (user_id, username, email, created_at)
VALUES (uuid(), 'jane_smith', '[email protected]', toTimestamp(now()));
6. Query Data (2 minutes)¶
-- Get all users
SELECT * FROM users;
-- Get specific columns
SELECT username, email FROM users;
Common Questions¶
What is the minimum hardware for development?¶
For development, Cassandra can run with: - 2 CPU cores - 4GB RAM (8GB recommended) - 10GB disk space
Should I use SimpleStrategy or NetworkTopologyStrategy?¶
- SimpleStrategy: Only for single-datacenter, development/testing
- NetworkTopologyStrategy: Always use for production, even single DC
What is the difference between cqlsh and CQLAI?¶
| Feature | cqlsh | CQLAI |
|---|---|---|
| Language | Python | Go (single binary) |
| AI Query Generation | No | Yes |
| Tab Completion | Basic | Context-aware |
| Output Formats | Basic | Table, JSON, CSV, Parquet |
| Dependencies | Python required | None |
Next Steps¶
After getting started:
- Learn Data Modeling - Design effective schemas
- Understand Architecture - How Cassandra works
- Explore CQL - Full query language reference
- Set Up Monitoring - Monitor the cluster
Getting Help¶
- Troubleshooting Guide - Common issues and solutions
- AxonOps Community - Community support
- Apache Cassandra Slack - Community chat