CQL Indexing Reference¶
Indexes in Cassandra do not work like relational indexes, and that surprises many. A relational index is a global structure—query it, and all matching rows are returned. A Cassandra secondary index is local to each node—query it without a partition key, and Cassandra has to ask every node in the cluster. That is an all-node scatter query, and it does not scale.
This makes secondary indexes useful only in specific situations: low-cardinality columns (status fields, categories) or queries that always include the partition key. For high-cardinality lookups like email addresses, a separate denormalized table is usually more appropriate.
Cassandra 5.0 introduced SAI (Storage-Attached Indexes), which handle more use cases efficiently. Materialized views offer another option—automated denormalization at the cost of write amplification.
This guide covers when each approach makes sense.
Behavioral Guarantees¶
What Indexes Guarantee¶
- Index updates are applied synchronously as part of the write path
- Indexes are local to each node (each node indexes only its own data)
- Queries with partition key plus indexed column contact only partition replicas
- Index build is asynchronous; queries may return partial results during build
- DROP INDEX removes the index immediately from schema
What Indexes Do NOT Guarantee¶
Undefined Behavior
The following behaviors are undefined and must not be relied upon:
- Global ordering: Index queries without partition key return results in undefined order
- Latency bounds: Scatter-gather queries have unbounded latency proportional to cluster size
- Memory usage: Large result sets from index queries may exhaust coordinator memory
- Build time: Index build duration depends on data volume; no progress or completion guarantee
- Result completeness during build: Queries may miss data on nodes still building the index
- Performance consistency: Query performance varies significantly based on data distribution
Index Type Comparison Contract¶
| Index Type | Query Types | Cardinality | Production Status |
|---|---|---|---|
| Legacy (2i) | Equality only | Low | Supported |
| SASI | Equality, range, LIKE | Medium | Deprecated |
| SAI | Equality, range, LIKE, ANN | Any | Recommended (5.0+) |
Query Execution Contract¶
| Query Pattern | Nodes Contacted | Performance |
|---|---|---|
pk = ? AND indexed = ? |
RF replicas | Fast |
indexed = ? |
All nodes | Slow (scatter-gather) |
indexed = ? LIMIT n |
All nodes (early termination) | Variable |
indexed > ? AND indexed < ? |
All nodes (SAI/SASI only) | Variable |
Index Selection Contract¶
| Scenario | Recommended Index | Alternative |
|---|---|---|
| Low cardinality + partition key | Legacy 2i or SAI | None needed |
| High cardinality lookup | SAI | Denormalized table |
| Text search (prefix/suffix) | SAI | Application-side filtering |
| Vector similarity | SAI with ANN | External vector database |
| Frequent alternative access pattern | Materialized view | Application-managed table |
Failure Semantics¶
| Failure Mode | Outcome | Client Action |
|---|---|---|
| Index build incomplete | Partial results | Wait for build or query specific partitions |
| Node unavailable | Results from available nodes only | Retry or accept partial results |
| Query timeout | Partial results or exception | Add partition key or reduce scope |
| Index not found | Query fails | Create index or use ALLOW FILTERING |
Version-Specific Behavior¶
| Version | Behavior |
|---|---|
| All | Legacy secondary indexes (2i) |
| 3.4+ | SASI indexes (experimental) |
| 4.0+ | SASI officially not recommended for production |
| 5.0+ | SAI (Storage-Attached Indexes) as recommended default (CEP-7) |
Index Types Overview¶
| Type | Best For | Limitations |
|---|---|---|
| Secondary Index | Low cardinality, occasional queries | Full cluster scan |
| SAI (Cassandra 5.0+) | General purpose, high cardinality | Requires Cassandra 5.0+ |
| SASI (deprecated) | Prefix/suffix search | Being replaced by SAI |
| Materialized View | Frequent alternative queries | Write amplification |
Secondary Indexes¶
Creating Secondary Indexes¶
-- Basic secondary index
CREATE INDEX ON users (email);
-- Named index
CREATE INDEX users_email_idx ON users (email);
-- Index on collection values
CREATE INDEX ON users (tags); -- For set/list
-- Index on map keys
CREATE INDEX ON users (KEYS(preferences));
-- Index on map values
CREATE INDEX ON users (VALUES(preferences));
-- Index on map entries
CREATE INDEX ON users (ENTRIES(preferences));
-- Index on full frozen collection
CREATE INDEX ON users (FULL(address));
Querying with Secondary Indexes¶
-- Query by indexed column
SELECT * FROM users WHERE email = '[email protected]';
-- With collection index
SELECT * FROM users WHERE tags CONTAINS 'premium';
-- With map index
SELECT * FROM users WHERE preferences CONTAINS KEY 'theme';
SELECT * FROM users WHERE preferences CONTAINS 'dark';
SELECT * FROM users WHERE preferences['theme'] = 'dark';
When to Use Secondary Indexes¶
Good use cases: - Low cardinality columns (status, country) - Infrequent queries - Combined with partition key - Small result sets expected
Bad use cases: - High cardinality columns (user_id, email) - Frequently queried columns - Large result sets - Primary access pattern
Secondary Index Limitations¶
-- These queries require ALLOW FILTERING or fail:
-- No partition key, no index
SELECT * FROM users WHERE age > 30; -- Error
-- Range query on indexed column
SELECT * FROM users WHERE age > 30 ALLOW FILTERING; -- Slow
-- Multiple conditions without partition key
SELECT * FROM users WHERE status = 'active' AND country = 'US';
-- Slow even if both indexed
Collection Index Behavior¶
Secondary indexes on collections have special query semantics:
| Collection Type | Index Type | Query Operator | Example |
|---|---|---|---|
| SET/LIST | Values | CONTAINS |
WHERE tags CONTAINS 'premium' |
| MAP | KEYS() |
CONTAINS KEY |
WHERE preferences CONTAINS KEY 'theme' |
| MAP | VALUES() |
CONTAINS |
WHERE preferences CONTAINS 'dark' |
| MAP | ENTRIES() |
[] accessor |
WHERE preferences['theme'] = 'dark' |
| FROZEN | FULL() |
= (full match) |
WHERE address = {street: '...', city: '...'} |
Collection Index Restrictions
CONTAINSandCONTAINS KEYonly work with indexed collections- Cannot combine multiple
CONTAINSon same collection in one query - Each element in collection is indexed separately (storage overhead)
FULL()index requires exact match of entire frozen collection- Collection indexes still result in scatter-gather without partition key
-- ERROR: Cannot use multiple CONTAINS on same column
SELECT * FROM users
WHERE tags CONTAINS 'premium' AND tags CONTAINS 'verified';
-- WORKAROUND: Filter in application or use intersection
High-Cardinality Performance Pitfalls¶
Avoid Secondary Indexes on High-Cardinality Columns
Secondary indexes on high-cardinality columns (many unique values) cause severe performance problems:
| Issue | Impact |
|---|---|
| Index size | Approaches table size; one entry per unique value |
| Read amplification | Must check index on every node |
| Hotspots | Popular values create uneven load |
| Tombstone buildup | Deleted values leave tombstones in index |
| Memory pressure | Large indexes consume heap on each node |
Example of problematic indexing:
-- BAD: High cardinality
CREATE INDEX ON users (email); -- Unique per user
CREATE INDEX ON orders (order_id); -- Unique per order
CREATE INDEX ON events (timestamp); -- High cardinality
-- BETTER: Low cardinality
CREATE INDEX ON users (account_type); -- Few values: 'free', 'premium', 'enterprise'
CREATE INDEX ON orders (status); -- Few values: 'pending', 'shipped', 'delivered'
For high-cardinality lookups, use:
- Denormalized tables with the lookup column as partition key
- SAI indexes (Cassandra 5.0+) which handle cardinality better
- Application-side caching for frequently accessed values
Storage-Attached Indexes (SAI)¶
SAI is the next-generation indexing in Cassandra 5.0+, offering better performance and more capabilities than secondary indexes.
Creating SAI Indexes¶
-- Basic SAI index
CREATE CUSTOM INDEX ON users (email)
USING 'StorageAttachedIndex';
-- Named SAI index
CREATE CUSTOM INDEX users_email_sai ON users (email)
USING 'StorageAttachedIndex';
-- SAI with options
CREATE CUSTOM INDEX ON users (description)
USING 'StorageAttachedIndex'
WITH OPTIONS = {
'case_sensitive': 'false',
'normalize': 'true'
};
-- Numeric index for range queries
CREATE CUSTOM INDEX ON products (price)
USING 'StorageAttachedIndex';
SAI Query Capabilities¶
-- Equality queries
SELECT * FROM users WHERE email = '[email protected]';
-- Range queries (numeric)
SELECT * FROM products WHERE price >= 100 AND price < 500;
-- AND queries with multiple SAI columns
SELECT * FROM products
WHERE category = 'electronics'
AND price < 1000
AND in_stock = true;
-- Combined with partition key (most efficient)
SELECT * FROM orders
WHERE user_id = ?
AND status = 'pending';
SAI vs Secondary Index¶
| Feature | Secondary Index | SAI |
|---|---|---|
| High cardinality | Poor | Good |
| Range queries | No | Yes |
| Multiple conditions | Poor | Better |
| Write overhead | Lower | Higher |
| Text search | No | Limited |
| Availability | All versions | 5.0+ |
SAI Limitations¶
- Still scans all nodes for queries without partition key
- Not a replacement for proper data modeling
- Higher storage overhead than secondary indexes
- Text search is basic (no fuzzy matching)
SASI Indexes (Deprecated)¶
SASI (SSTable Attached Secondary Index) is deprecated but still available. Use SAI instead for new deployments.
Creating SASI Indexes¶
-- Prefix search index
CREATE CUSTOM INDEX ON users (username)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'PREFIX',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false'
};
-- Contains search index
CREATE CUSTOM INDEX ON users (bio)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false'
};
-- Numeric sparse index
CREATE CUSTOM INDEX ON orders (total)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'SPARSE'
};
SASI Query Patterns¶
-- Prefix search (LIKE 'prefix%')
SELECT * FROM users WHERE username LIKE 'john%';
-- Contains search (LIKE '%substring%')
SELECT * FROM users WHERE bio LIKE '%developer%';
-- Range queries
SELECT * FROM orders WHERE total > 100 AND total < 500;
Materialized Views¶
Materialized views automatically maintain denormalized copies of data with different primary keys.
Creating Materialized Views¶
-- Base table
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT,
country TEXT,
created_at TIMESTAMP
);
-- Materialized view by email
CREATE MATERIALIZED VIEW users_by_email AS
SELECT * FROM users
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);
-- View with filtered data
CREATE MATERIALIZED VIEW active_users_by_country AS
SELECT user_id, username, country, created_at FROM users
WHERE country IS NOT NULL
AND user_id IS NOT NULL
AND status = 'active'
PRIMARY KEY (country, created_at, user_id)
WITH CLUSTERING ORDER BY (created_at DESC);
Querying Materialized Views¶
-- Query the view like a regular table
SELECT * FROM users_by_email WHERE email = '[email protected]';
SELECT * FROM active_users_by_country
WHERE country = 'US'
ORDER BY created_at DESC
LIMIT 100;
Materialized View Rules¶
Requirements: - All base table primary key columns must be in the view's primary key - One new column can be added to the partition key - All non-null filters must use IS NOT NULL - View primary key must include base table primary key
-- Base table
CREATE TABLE orders (
order_id UUID,
user_id UUID,
status TEXT,
amount DECIMAL,
created_at TIMESTAMP,
PRIMARY KEY ((order_id))
);
-- Valid view: adds user_id to partition key
CREATE MATERIALIZED VIEW orders_by_user AS
SELECT * FROM orders
WHERE user_id IS NOT NULL AND order_id IS NOT NULL
PRIMARY KEY ((user_id), created_at, order_id)
WITH CLUSTERING ORDER BY (created_at DESC);
-- Invalid: order_id not in primary key
-- CREATE MATERIALIZED VIEW invalid_view AS
-- SELECT * FROM orders
-- WHERE user_id IS NOT NULL
-- PRIMARY KEY ((user_id), created_at); -- Missing order_id!
Materialized View Limitations¶
| Limitation | Impact |
|---|---|
| Write amplification | Every write to base table writes to all views |
| Eventual consistency | Views may lag behind base table |
| Limited transformations | Cannot use functions or aggregates |
| Repair complexity | Views need repair too |
| Schema changes | Cannot alter view; must drop and recreate |
When to Use Materialized Views¶
Good use cases: - Small number of views per table - Low write throughput - Need guaranteed consistency with base table - Simple alternative access patterns
Avoid when: - High write throughput - Many views needed - Complex transformations required - Critical latency requirements
Index Management¶
Listing Indexes¶
-- List all indexes
SELECT * FROM system_schema.indexes;
-- Indexes for specific keyspace
SELECT * FROM system_schema.indexes WHERE keyspace_name = 'my_keyspace';
-- Indexes for specific table
SELECT * FROM system_schema.indexes
WHERE keyspace_name = 'my_keyspace' AND table_name = 'users';
Dropping Indexes¶
-- Drop by name
DROP INDEX IF EXISTS users_email_idx;
-- Drop by keyspace.name
DROP INDEX my_keyspace.users_email_idx;
Rebuilding Indexes¶
# Rebuild all indexes on a table
nodetool rebuild_index my_keyspace users
# Rebuild specific index
nodetool rebuild_index my_keyspace users users_email_idx
Managing Materialized Views¶
-- List views
SELECT * FROM system_schema.views WHERE keyspace_name = 'my_keyspace';
-- Drop view
DROP MATERIALIZED VIEW IF EXISTS users_by_email;
-- Cannot alter views; must drop and recreate
Index Best Practices¶
Secondary Index Guidelines¶
DO:
✓ Use for low-cardinality columns
✓ Combine with partition key when possible
✓ Use for occasional queries
✓ Index columns queried with equality
DON'T:
✗ Index high-cardinality columns (use SAI instead)
✗ Rely on indexes for primary access patterns
✗ Use for range queries (use SAI instead)
✗ Create many indexes on single table
SAI Guidelines (5.0+)¶
DO:
✓ Use for high-cardinality columns
✓ Use for range queries
✓ Combine multiple SAI indexes in queries
✓ Use with partition key for best performance
DON'T:
✗ Use as replacement for good data model
✗ Query without partition key for large datasets
✗ Expect full-text search capabilities
Materialized View Guidelines¶
DO:
✓ Limit to 2-3 views per table
✓ Use for guaranteed consistency needs
✓ Use for simple access pattern changes
✓ Include all base table PK columns
DON'T:
✗ Create many views per table
✗ Use with high write throughput
✗ Use for complex data transformations
✗ Forget to repair views
Index Selection Decision Tree¶
Need to query by non-PK column?
│
├── Is this a frequent/primary access pattern?
│ └── Yes: Create a new table (denormalization)
│
├── Need guaranteed consistency with base table?
│ └── Yes: Consider Materialized View
│
├── Running Cassandra 5.0+?
│ ├── Yes: Use SAI
│ │ ├── High cardinality: SAI
│ │ ├── Range queries: SAI
│ │ └── Multiple conditions: SAI
│ │
│ └── No: Secondary Index or SASI
│ ├── Low cardinality: Secondary Index
│ ├── Prefix/contains search: SASI
│ └── High cardinality: Consider data model change
│
└── Is performance acceptable?
├── Yes: Keep current solution
└── No: Redesign data model
Next Steps¶
- Data Modeling - Query-first design
- DML Reference - Query syntax
- Performance Tuning - Optimization
- Anti-Patterns - What to avoid