Data Modeling Concepts¶

Data modeling in Cassandra works backwards from what might be expected. Instead of designing tables around entities and then writing queries, start with the queries and design tables to serve them. Need to look up users by email? That is a table. Need to look up users by ID? That is another table—possibly with the same data.

This denormalized approach feels wasteful to those accustomed to relational databases, but it is the key to Cassandra's performance. Every query hits exactly the data it needs, stored together on disk, on a predictable set of nodes. No joins, no cross-partition lookups, no surprises at scale.

The Wide-Column Model¶

Cassandra's storage model derives from Google's Bigtable—a sorted map structure:

(row_key, column_family:column_qualifier, timestamp) → value

Cassandra adapted this to:

(partition_key, clustering_columns...) → column_values

Before CQL (2011), Cassandra used the Thrift API, which exposed this model directly:

// Thrift API (circa 2010)
ColumnPath path = new ColumnPath("Users");
path.setColumn("email".getBytes());
client.insert("user123", path,
    new Column("email".getBytes(), "[email protected]".getBytes(), timestamp),
    ConsistencyLevel.QUORUM);

Thrift required working with raw bytes and column families. CQL provides a SQL-like abstraction, but the underlying storage model remains the same.

How CQL Maps to Internal Storage¶

Consider this CQL statement:

CREATE TABLE user_events (
    user_id UUID,
    event_time TIMESTAMP,
    event_type TEXT,
    data TEXT,
    PRIMARY KEY ((user_id), event_time, event_type)
);

INSERT INTO user_events (user_id, event_time, event_type, data)
VALUES (123e4567-e89b-12d3-a456-426614174000, '2024-01-15 10:30:00', 'click', '{"page": "/home"}');

Cassandra stores it internally like this:

Partition Key: 123e4567-e89b-12d3-a456-426614174000

Cell Key (clustering columns)	Column	Value
(2024-01-15 10:30:00, click)	data	'{"page": "/home"}'
(2024-01-15 10:30:00, click)	event_type	'click'
(2024-01-15 10:30:00, click)	event_time	...

The clustering columns (event_time, event_type) become part of each cell's internal key. The partition key determines which node stores the data. All cells with the same partition key are stored together, sorted by clustering column.

This is why Cassandra queries have strict rules:

Partition key required: Without it, Cassandra does not know which node to ask
Clustering columns in order: They form a composite key—columns cannot be skipped
Range queries only on last column: The storage is sorted, so ranges work on the final sort key

Understanding this mapping explains Cassandra's query restrictions.

Cassandra data modeling differs fundamentally from relational databases. There are no JOINs, normalization hurts performance, and tables must be designed around query patterns rather than around entity relationships.

How Cassandra Data Modeling Is Different¶

Cassandra's distributed architecture fundamentally changes how data modeling is approached. In relational databases, tables are designed around entities (normalization), then queries are written against them. In Cassandra, tables are designed around queries, then data is arranged to serve them efficiently.

This is not a limitation—it is how Cassandra achieves consistent performance at any scale.

Traditional RDBMS Approach:

Identify entities (Users, Orders, Products)
Normalize to eliminate redundancy
Write queries with JOINs
Add indexes when queries are slow
Scale vertically when database is overwhelmed

Result: Flexible queries, but joins + indexes + locks = unpredictable latency

Cassandra Approach:

List ALL queries the application needs
Design one table per query pattern
Denormalize data across tables
Accept write duplication for read optimization
Scale horizontally by adding nodes

Result: Predictable latency regardless of data size, but queries must be planned

A poorly designed Cassandra table does not just slow down—it can bring down the entire cluster:

Bad Design	Consequence
Query without partition key	Full cluster scan, timeouts
Unbounded partition growth	Node instability, OOM
High-cardinality secondary index	Every node queried per request
Frequent deletes without TTL	Tombstone accumulation, read amplification

The Primary Key: Foundation of Everything¶

The primary key determines: 1. Where data lives (which node stores it) 2. How data is organized (sort order within a partition) 3. What queries are efficient (only partition key + clustering queries work)

PRIMARY KEY ((partition_key_col1, partition_key_col2), clustering_col1, clustering_col2)
            └──────────────────┬────────────────────┘  └──────────────┬──────────────┘
                               │                                       │
                   Composite Partition Key                    Clustering Columns
                   (Determines node placement)                (Determines sort order)

Partition Key¶

The partition key is hashed (using Murmur3) to determine which node owns the data:

Critical implications: - All rows with the same partition key are stored together on the same nodes - Queries must include the partition key (or suffer full cluster scans) - Partition key choice determines whether data is evenly distributed

Composite Partition Keys¶

When a single column does not provide enough distribution:

-- Single partition key (potential hot spot)
CREATE TABLE events (
    tenant_id TEXT,
    event_time TIMESTAMP,
    data TEXT,
    PRIMARY KEY ((tenant_id), event_time)
);
-- Problem: Tenant "bigcorp" has 90% of traffic → hot partition

-- Composite partition key (better distribution)
CREATE TABLE events (
    tenant_id TEXT,
    day DATE,
    event_time TIMESTAMP,
    data TEXT,
    PRIMARY KEY ((tenant_id, day), event_time)
);
-- Each tenant-day combination is a separate partition

Clustering Columns¶

Clustering columns determine sort order within a partition:

CREATE TABLE user_events (
    user_id UUID,
    event_time TIMESTAMP,
    event_type TEXT,
    data TEXT,
    PRIMARY KEY ((user_id), event_time, event_type)
) WITH CLUSTERING ORDER BY (event_time DESC, event_type ASC);

Partition: user_id = alice (sorted by event_time DESC, then event_type ASC):

Row	event_time	event_type	data
1	2024-01-03 14:00	click	...
2	2024-01-03 13:00	login	...
3	2024-01-03 12:00	click	...
4	2024-01-02 18:00	logout	...
...

Clustering column rules: - Must be queried in order (cannot skip columns) - Range queries allowed on the last queried column - Defines row uniqueness within partition

-- Valid queries
SELECT * FROM user_events WHERE user_id = ?;
SELECT * FROM user_events WHERE user_id = ? AND event_time = ?;
SELECT * FROM user_events WHERE user_id = ? AND event_time > '2024-01-01';
SELECT * FROM user_events WHERE user_id = ? AND event_time = ? AND event_type = ?;

-- Invalid queries (without ALLOW FILTERING)
SELECT * FROM user_events WHERE event_time = ?;  -- Missing partition key
SELECT * FROM user_events WHERE user_id = ? AND event_type = ?;  -- Skipped event_time

Partition Design Deep Dive¶

What Makes a Good Partition Key¶

High cardinality - Many unique values distribute data across nodes:

-- Good: user_id has millions of unique values
PRIMARY KEY ((user_id), ...)

-- Bad: status has only 3 values → 3 partitions total
PRIMARY KEY ((status), ...)  -- active/inactive/pending

Even distribution - Avoids hot spots:

-- Problem: Geographic data skewed toward certain countries
PRIMARY KEY ((country), city, user_id)
-- USA partition has 100x more data than Luxembourg

-- Better: Add hash bucket for distribution
PRIMARY KEY ((country, user_bucket), city, user_id)
-- user_bucket = hash(user_id) % 10

Query alignment - Matches access patterns:

-- Query: Get all orders for a user
SELECT * FROM orders WHERE user_id = ?;

-- Partition key must be user_id
PRIMARY KEY ((user_id), order_date, order_id)

Bounded growth - Partitions should not grow forever:

-- Dangerous: Partition grows indefinitely
PRIMARY KEY ((sensor_id), reading_time)
-- After 5 years: 157 million rows per sensor

-- Safe: Time-bounded partitions
PRIMARY KEY ((sensor_id, month), reading_time)
-- Each partition covers 1 month

Partition Size Calculations¶

Target partition size: 10MB - 100MB

Partition Size Formula:

partition_size = Nv × (Ck + Cs + average_column_sizes) + Nstatic × static_column_sizes

Variable	Description
Nv	Number of rows (cells in wide rows)
Ck	Clustering key size
Cs	Per-cell overhead (~23 bytes)
Nstatic	1 if static columns exist, 0 otherwise

Practical example:

CREATE TABLE sensor_readings (
    sensor_id TEXT,
    day DATE,
    reading_time TIMESTAMP,
    temperature DOUBLE,
    humidity DOUBLE,
    pressure DOUBLE,
    PRIMARY KEY ((sensor_id, day), reading_time)
);

Per-row storage:

Component	Size
Clustering key (timestamp)	8 bytes
temperature (double)	8 bytes
humidity (double)	8 bytes
pressure (double)	8 bytes
Per-cell overhead (4 cells)	92 bytes
Total per row	~124 bytes

Partition size projections:

Frequency	Readings/Day	Daily Partition Size	Status
1/second	86,400	10.7 MB	✓ Good
10/second	864,000	107 MB	⚠ Borderline

High-Frequency Solution

Use hourly buckets instead of daily. Hourly partition size: 36,000 × 124 = 4.5 MB ✓

Monitoring Partition Size in Production¶

# Check partition size distribution
nodetool tablehistograms keyspace.table

# Output interpretation:
Partition Size (bytes)
                             Percentile      Value
                                   50%      12.5MB  # Median - good
                                   75%      35.2MB  # OK
                                   95%      89.1MB  # Acceptable
                                   99%     234.5MB  # WARNING: Some large partitions
                                  Max      891.2MB  # CRITICAL: Outliers need investigation

-- Identify large partitions (requires full table scan - use carefully!)
-- Better: Use sstablemetadata on individual sstables

Denormalization Strategies¶

Why Denormalize¶

Cassandra has no JOINs. If data from multiple "entities" is needed in one query, that data must be in one table.

-- Relational (normalized) - Requires JOIN
SELECT o.order_id, o.total, u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.user_id
WHERE o.order_id = 123;

-- Cassandra (denormalized) - Single query
SELECT order_id, total, user_name, user_email
FROM orders
WHERE order_id = 123;

Denormalization Patterns¶

Embedding related data:

-- Instead of separate address table
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    name TEXT,
    email TEXT,
    -- Embedded addresses
    addresses LIST<FROZEN<address_type>>
);

CREATE TYPE address_type (
    label TEXT,        -- 'home', 'work'
    street TEXT,
    city TEXT,
    postal_code TEXT,
    country TEXT
);

Duplicating for different access patterns:

-- Query 1: Get order by order_id
CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    user_id UUID,
    user_email TEXT,    -- Denormalized
    product_count INT,
    total DECIMAL,
    created_at TIMESTAMP
);

-- Query 2: Get orders by user (recent first)
CREATE TABLE orders_by_user (
    user_id UUID,
    order_date DATE,
    order_id UUID,
    product_count INT,
    total DECIMAL,
    PRIMARY KEY ((user_id), order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id DESC);

-- Same data, different organization

Pre-computing aggregations:

-- Instead of: SELECT COUNT(*) FROM orders WHERE user_id = ?
-- Maintain a counter
CREATE TABLE user_stats (
    user_id UUID PRIMARY KEY,
    order_count COUNTER,
    total_spent COUNTER  -- Store as cents
);

-- Increment on each order
UPDATE user_stats SET order_count = order_count + 1
WHERE user_id = ?;

Managing Denormalization¶

The cost of denormalization is write amplification and consistency complexity:

Schema Type	Writes Required
Single write (normalized)	1 table
Denormalized (2 tables)	2 writes
Denormalized (5 tables)	5 writes

Consistency Concern

What if write #3 of 5 fails? Use batches for atomicity.

Batch writes for atomicity:

-- Logged batch ensures all-or-nothing (same partition preferred)
BEGIN BATCH
    INSERT INTO orders (order_id, user_id, total) VALUES (?, ?, ?);
    INSERT INTO orders_by_user (user_id, order_date, order_id, total) VALUES (?, ?, ?, ?);
APPLY BATCH;

Handling stale denormalized data:

-- When user updates their email:
-- 1. Update users table
UPDATE users SET email = ? WHERE user_id = ?;

-- 2. Future orders use new email
-- 3. Old orders retain old email (often acceptable)

-- If old orders must update (expensive):
-- Query all user's orders, update each one
-- Consider: Is this worth the complexity?

Query-First Design Process¶

Step 1: List All Queries¶

Before creating any tables, enumerate every query the application needs:

Application: E-commerce Platform

Query	Description
Q1	Get product details by product_id
Q2	List products in a category (paginated)
Q3	Get user profile by user_id
Q4	Authenticate user by email
Q5	Get user's recent orders
Q6	Get order details with items
Q7	Get shopping cart contents
Q8	Check inventory for product
Q9	Get product reviews (recent first)
Q10	Search products by name prefix

Step 2: Analyze Query Requirements¶

For each query, identify:

Q5: Get user's recent orders

Attribute	Value
Access pattern	Point lookup + range scan
Partition key	user_id (equality)
Clustering	order_date (descending range)
Columns needed	order_id, order_date, status, total, item_count
Cardinality	High (millions of users)
Growth pattern	Bounded (users have limited orders)
Access frequency	High (every user profile view)
Latency SLA	< 50ms p99

Step 3: Design Tables¶

Map each query to a table:

-- Q5: Get user's recent orders
CREATE TABLE orders_by_user (
    user_id UUID,               -- Partition key (equality in query)
    order_date DATE,            -- Clustering (for date ranges)
    order_id UUID,              -- Clustering (uniqueness)
    status TEXT,
    total DECIMAL,
    item_count INT,
    PRIMARY KEY ((user_id), order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id DESC);

-- Query implementation
SELECT * FROM orders_by_user
WHERE user_id = ?
LIMIT 20;

-- With date filter
SELECT * FROM orders_by_user
WHERE user_id = ?
  AND order_date >= '2024-01-01'
  AND order_date <= '2024-12-31';

Step 4: Validate Design¶

Checklist for each table:

Check	Requirement
✓	Primary access query uses partition key equality
✓	Range queries on clustering columns (in order)
✓	Partition size bounded (< 100 MB)
✓	Even distribution (high cardinality partition key)
✓	No `ALLOW FILTERING` needed
✓	All columns needed by query are present
✓	Sort order matches query requirements

Common Patterns¶

Wide Row Pattern¶

Store related items in one partition:

-- User's notifications (bounded by application logic)
CREATE TABLE notifications (
    user_id UUID,
    notification_id TIMEUUID,
    message TEXT,
    read BOOLEAN,
    created_at TIMESTAMP,
    PRIMARY KEY ((user_id), notification_id)
) WITH CLUSTERING ORDER BY (notification_id DESC)
  AND default_time_to_live = 2592000;  -- 30 days

-- Query: Recent unread
SELECT * FROM notifications
WHERE user_id = ?
LIMIT 50;

Time Bucketing Pattern¶

Prevent unbounded partition growth:

-- High-volume event stream
CREATE TABLE events (
    tenant_id TEXT,
    hour TIMESTAMP,         -- Bucket: truncate to hour
    event_id TIMEUUID,
    event_type TEXT,
    payload TEXT,
    PRIMARY KEY ((tenant_id, hour), event_id)
) WITH CLUSTERING ORDER BY (event_id DESC);

-- Application calculates bucket:
-- hour = event_time - (event_time % 3600000)

-- Query single hour
SELECT * FROM events
WHERE tenant_id = 'acme'
  AND hour = '2024-01-15 14:00:00+0000'
LIMIT 100;

Materialized View Alternative¶

When the same data needs to be accessible by different keys:

-- Base table: users by ID
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email TEXT,
    name TEXT,
    created_at TIMESTAMP
);

-- Manual materialized view: users by email
CREATE TABLE users_by_email (
    email TEXT PRIMARY KEY,
    user_id UUID,
    name TEXT
);

-- Application maintains both:
BEGIN BATCH
    INSERT INTO users (user_id, email, name, created_at) VALUES (?, ?, ?, ?);
    INSERT INTO users_by_email (email, user_id, name) VALUES (?, ?, ?);
APPLY BATCH;

Note: Native materialized views exist but have production stability issues. Manual denormalization is recommended.

Data Types for Modeling¶

Choosing the Right Types¶

Use Case	Type	Notes
Unique identifiers	UUID or TIMEUUID	TIMEUUID includes timestamp
Timestamps	TIMESTAMP	Millisecond precision
Dates only	DATE	More efficient than TIMESTAMP
Money	DECIMAL	Never use FLOAT for currency
Small integers	INT	32-bit signed
Large integers	BIGINT	64-bit signed
Flags	BOOLEAN
Short strings	TEXT	Unbounded, UTF-8
Fixed strings	ASCII	7-bit ASCII only
Binary data	BLOB	Images, serialized objects
IP addresses	INET	IPv4 or IPv6

Collections¶

-- List: Ordered, allows duplicates
tags LIST<TEXT>
-- Insert: ['cassandra', 'database', 'cassandra'] ✓
-- Max recommended: ~100 elements

-- Set: Unordered, unique values
categories SET<TEXT>
-- Insert: {'electronics', 'computers'} (sorted internally)
-- Max recommended: ~100 elements

-- Map: Key-value pairs
metadata MAP<TEXT, TEXT>
-- Insert: {'color': 'blue', 'size': 'large'}
-- Max recommended: ~100 entries

Collection limitations: - Entire collection read/written atomically - No querying inside collections (without SAI) - Collections > 100 elements impact performance - Consider separate table for large collections

User-Defined Types (UDTs)¶

CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    postal_code TEXT,
    country TEXT
);

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    name TEXT,
    home_address FROZEN<address>,
    work_address FROZEN<address>
);

-- FROZEN required: stored as single blob
-- Cannot update individual fields
UPDATE users SET home_address = {
    street: '123 Main St',
    city: 'Boston',
    state: 'MA',
    postal_code: '02101',
    country: 'USA'
} WHERE user_id = ?;

Static Columns¶

Static columns are shared across all rows in a partition:

CREATE TABLE team_members (
    team_id UUID,
    member_id UUID,
    team_name TEXT STATIC,      -- Same for all members
    team_budget DECIMAL STATIC, -- Same for all members
    member_name TEXT,
    member_role TEXT,
    PRIMARY KEY ((team_id), member_id)
);

INSERT INTO team_members (team_id, member_id, team_name, team_budget, member_name, member_role)
VALUES (?, ?, 'Engineering', 100000, 'Alice', 'Developer');

INSERT INTO team_members (team_id, member_id, member_name, member_role)
VALUES (?, ?, 'Bob', 'Designer');
-- team_name and team_budget already set for this partition

Partition: team_id = engineering-team

Column Type	Column	Value
STATIC	team_name	'Engineering'
STATIC	team_budget	100000

member_id	member_name	role
alice	'Alice'	'Dev'
bob	'Bob'	'Designer'
carol	'Carol'	'PM'

Use cases: - Partition metadata (team info, account settings) - Avoiding denormalization of partition-level data - Aggregate values (counts, sums) for the partition

Modeling Best Practices Summary¶

Do¶

Practice
Start with queries, not entities
One table per query pattern
Choose partition keys with high cardinality
Keep partitions bounded (< 100 MB)
Use clustering columns for sort order
Denormalize for read performance
Use TTL for temporary data
Use batches for atomic multi-table writes
Use counters for aggregations
Test with production-like data volumes

Don't¶

Anti-Practice
Design tables before knowing queries
Use `ALLOW FILTERING` in production
Create unbounded partitions
Use secondary indexes as primary access pattern
Store large collections (> 100 elements)
Update individual UDT fields (use FROZEN)
Rely on Cassandra materialized views
Use Cassandra as a queue
Perform frequent deletes (use TTL instead)
Normalize data like in RDBMS

Next Steps¶

Time Bucketing Pattern - Managing time-series data
Anti-Patterns - Mistakes to avoid
E-Commerce Example - Complete schema design
CQL Reference - Query language details