Index Commands¶

Secondary indexes enable queries on columns that are not part of the primary key. Cassandra supports multiple index implementations, each with different performance characteristics and query capabilities.

Behavioral Guarantees¶

What Index Operations Guarantee¶

CREATE INDEX initiates asynchronous index building; the command returns before indexing completes
Index updates are applied synchronously as part of the write path once the index is built
IF NOT EXISTS provides idempotent index creation
DROP INDEX removes the index immediately; the command returns after schema propagation
Each index is local to each node (node indexes only its own data)

What Index Operations Do NOT Guarantee¶

Undefined Behavior

The following behaviors are undefined and must not be relied upon:

Query availability during build: Queries using the index may fail or return partial results until build completes on all nodes
Build completion time: Index build duration depends on data volume and cluster load; no timeout or progress guarantee
Query performance: Scatter-gather queries (without partition key) have unbounded latency depending on cluster size and data distribution
Result completeness during failures: If nodes are unavailable, indexed queries may return incomplete results
Index-only scans: All index queries require reading the base table; index does not store complete row data

Index Build Contract¶

State	Query Behavior	How to Check
Building	Queries may fail or return partial results	`nodetool indexbuildstatus`
Built	Queries return complete results from available nodes	`SELECT * FROM system_schema.indexes`
Failed	Index unusable; must drop and recreate	Check logs for build errors

Query Execution Contract¶

Query Type	Nodes Contacted	Performance
Index query with partition key	Replicas for that partition	Fast (single partition)
Index query without partition key	All nodes in cluster	Slow (scatter-gather)
Index query with token range	Nodes owning that range	Variable

Failure Semantics¶

Failure Mode	Outcome	Client Action
Timeout during CREATE INDEX	Schema may have propagated; build may be in progress	Check index status
Node failure during build	Build continues on other nodes; failed node rebuilds on restart	Monitor build status
Query timeout on indexed column	Partial results possible	Retry or add partition key constraint
`IndexNotAvailableException`	Index build incomplete	Wait for build to complete

Version-Specific Behavior¶

Version	Behavior
3.4+	SASI indexes available (CASSANDRA-10661)
4.0+	Improved index build handling
5.0+	SAI as default index implementation (CEP-7), SASI deprecated

Index Architecture¶

How Secondary Indexes Work¶

Secondary indexes in Cassandra are local indexes—each node indexes only the data it owns. When querying an indexed column without the partition key:

Client Query → Coordinator → Scatter to ALL Nodes → Gather Results → Return
                                    │
                                    └── Each node searches local index

This architecture has implications:

Without partition key: Query must contact all nodes (scatter-gather)
With partition key: Query contacts only replica nodes (efficient)
Index maintenance: Indexes are updated synchronously with writes

Performance Consideration

Secondary indexes perform best when:

The query includes the partition key
The indexed column has low-to-medium cardinality
The data distribution is relatively uniform

For high-cardinality columns or queries without partition keys, consider denormalized tables or materialized views instead.

Index Types¶

Type	Implementation	Best For	Cassandra Version
Legacy Secondary	`2i`	Low cardinality, equality queries	All
SASI	`SASIIndex`	Text search, ranges	3.4+
SAI	`StorageAttachedIndex`	General purpose, ranges, text	5.0+

CREATE INDEX¶

Create a secondary index on a table column.

Synopsis¶

CREATE [ CUSTOM ] INDEX [ IF NOT EXISTS ] [ *index_name* ]
    ON [ *keyspace_name*. ] *table_name* ( *index_target* )
    [ USING '*index_class*' ]
    [ WITH OPTIONS = { *option_map* } ]

index_target:

*column_name*
| KEYS ( *map_column* )
| VALUES ( *map_column* )
| ENTRIES ( *map_column* )
| FULL ( *frozen_collection_column* )

Description¶

CREATE INDEX creates a secondary index enabling queries on the specified column. The index is built asynchronously after creation; the command returns before indexing completes.

Parameters¶

index_name¶

Optional name for the index. If omitted, Cassandra generates a name in the format table_column_idx.

-- Named index
CREATE INDEX users_email_idx ON users (email);

-- Auto-named (becomes users_email_idx)
CREATE INDEX ON users (email);

IF NOT EXISTS¶

Prevents error if index already exists.

index_target¶

Specifies what to index:

Column Index¶

Index regular column values:

CREATE INDEX ON users (email);
CREATE INDEX ON users (country);

Collection Indexes¶

Index collection elements:

-- Index SET or LIST elements
CREATE INDEX ON users (tags);  -- SET<TEXT>
CREATE INDEX ON users (phone_numbers);  -- LIST<TEXT>

-- Index MAP keys
CREATE INDEX ON users (KEYS(attributes));

-- Index MAP values
CREATE INDEX ON users (VALUES(attributes));

-- Index MAP key-value pairs
CREATE INDEX ON users (ENTRIES(attributes));

-- Index entire frozen collection
CREATE INDEX ON users (FULL(frozen_addresses));

Target	Collection Type	Enables Query
`column`	`SET`, `LIST`	`WHERE column CONTAINS value`
`KEYS(column)`	`MAP`	`WHERE column CONTAINS KEY key`
`VALUES(column)`	`MAP`	`WHERE column CONTAINS value`
`ENTRIES(column)`	`MAP`	`WHERE column[key] = value`
`FULL(column)`	`FROZEN`	`WHERE column = entire_collection`

USING¶

Specifies the index implementation class.

Legacy Secondary Index (Default)¶

CREATE INDEX ON users (country);

Hash-based index
Equality queries only
Best for low-cardinality columns

Storage-Attached Index (SAI)¶

CREATE CUSTOM INDEX ON users (email)
    USING 'StorageAttachedIndex';

SAI (Cassandra 5.0+) provides:

Efficient numeric range queries
Text pattern matching (LIKE)
Better performance than legacy indexes
Lower storage overhead

SASI Index¶

CREATE CUSTOM INDEX ON users (username)
    USING 'org.apache.cassandra.index.sasi.SASIIndex';

SASI Status

SASI is marked as experimental and is not recommended for production use. Use SAI (Cassandra 5.0+) instead.

WITH OPTIONS¶

Index-specific configuration options.

SAI Options¶

CREATE CUSTOM INDEX ON users (email)
    USING 'StorageAttachedIndex'
    WITH OPTIONS = {
        'case_sensitive': 'false',
        'normalize': 'true',
        'ascii': 'true'
    };

Option	Type	Default	Description
`case_sensitive`	boolean	`true`	Case-sensitive text comparison
`normalize`	boolean	`false`	Unicode normalization
`ascii`	boolean	`false`	ASCII folding (é → e)

SASI Options¶

CREATE CUSTOM INDEX ON products (description)
    USING 'org.apache.cassandra.index.sasi.SASIIndex'
    WITH OPTIONS = {
        'mode': 'CONTAINS',
        'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
        'case_sensitive': 'false'
    };

Option	Values	Description
`mode`	`PREFIX`, `CONTAINS`, `SPARSE`	Query matching mode
`analyzer_class`	Class name	Text analyzer for tokenization
`case_sensitive`	`true`/`false`	Case sensitivity
`max_compaction_flush_memory_in_mb`	Number	Memory limit for indexing

Examples¶

Basic Indexes¶

-- Simple column index
CREATE INDEX ON users (email);

-- Named index
CREATE INDEX users_country_idx ON users (country);

-- Index with IF NOT EXISTS
CREATE INDEX IF NOT EXISTS ON users (last_name);

SAI Indexes (Cassandra 5.0+)¶

-- Numeric column for range queries
CREATE CUSTOM INDEX ON products (price)
    USING 'StorageAttachedIndex';

-- Case-insensitive text search
CREATE CUSTOM INDEX ON users (username)
    USING 'StorageAttachedIndex'
    WITH OPTIONS = {'case_sensitive': 'false'};

-- Multiple SAI indexes on same table
CREATE CUSTOM INDEX ON orders (status) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON orders (total) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON orders (created_at) USING 'StorageAttachedIndex';

Collection Indexes¶

-- Table with collections
CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY,
    tags SET<TEXT>,
    attributes MAP<TEXT, TEXT>,
    scores LIST<INT>
);

-- Index SET elements
CREATE INDEX ON user_profiles (tags);
-- Query: WHERE tags CONTAINS 'premium'

-- Index MAP keys
CREATE INDEX ON user_profiles (KEYS(attributes));
-- Query: WHERE attributes CONTAINS KEY 'department'

-- Index MAP values
CREATE INDEX ON user_profiles (VALUES(attributes));
-- Query: WHERE attributes CONTAINS 'engineering'

-- Index MAP entries
CREATE INDEX ON user_profiles (ENTRIES(attributes));
-- Query: WHERE attributes['department'] = 'engineering'

Querying Indexed Columns¶

-- Equality query
SELECT * FROM users WHERE email = '[email protected]';

-- SAI range query
SELECT * FROM products WHERE price > 100 AND price < 500;

-- SAI pattern matching
SELECT * FROM users WHERE username LIKE 'john%';

-- Collection query
SELECT * FROM user_profiles WHERE tags CONTAINS 'premium';

-- With partition key (most efficient)
SELECT * FROM orders
WHERE customer_id = ? AND status = 'pending';

Restrictions¶

Restrictions

Cannot index partition key columns (already indexed)
Cannot create multiple indexes on the same column
Cannot index COUNTER columns
Legacy indexes do not support range queries
LIKE queries require SAI or SASI indexes

Cardinality Considerations

Cardinality	Example	Index Recommendation
Very low	`boolean`, `status` (few values)	Secondary index OK
Low-medium	`country`, `category`	Secondary index OK
High	`email`, `user_id`	Avoid; use primary key or denormalize
Unique	`uuid`, `timestamp`	Never index; use primary key

High-cardinality indexes create large index structures on each node, causing: - High memory usage - Slow index lookups - Heavy read amplification

Notes¶

Index building occurs asynchronously after CREATE INDEX returns
Monitor index build progress: nodetool compactionstats
Indexes are stored in separate SSTable files
Dropping a table automatically drops all its indexes
Index updates are synchronous with writes, adding write latency

Index Build Time

For existing tables with data, index building can take significant time:

# Monitor index build progress
nodetool compactionstats

# View pending index builds
nodetool compactionstats | grep "Secondary index"

DROP INDEX¶

Remove a secondary index.

Synopsis¶

DROP INDEX [ IF EXISTS ] [ *keyspace_name*. ] *index_name*

Description¶

DROP INDEX removes a secondary index. Queries using the index will fail after removal. The index data is deleted asynchronously.

Parameters¶

IF EXISTS¶

Prevents error if index does not exist.

index_name¶

The name of the index to drop. Must be qualified with keyspace if not using USE.

Examples¶

-- Drop by name
DROP INDEX users_email_idx;

-- With keyspace qualification
DROP INDEX my_keyspace.users_email_idx;

-- Safe drop
DROP INDEX IF EXISTS users_country_idx;

Finding Index Names¶

-- List all indexes in keyspace
SELECT index_name, table_name, options
FROM system_schema.indexes
WHERE keyspace_name = 'my_keyspace';

-- Describe table to see indexes
DESCRIBE TABLE users;

Restrictions¶

Restrictions

Cannot drop index while queries are actively using it (they will fail)
Requires DROP permission on the table

Notes¶

Index drop is a metadata operation; data files are deleted asynchronously
Queries using the dropped index will fail with error
Consider application impact before dropping indexes in production

Index Selection Guidelines¶

When to Use Secondary Indexes¶

Good Use Cases

Low-to-medium cardinality columns (< 1000 unique values per partition)
Queries that usually include the partition key
Filtering within partitions
Collection element searches

When to Avoid Secondary Indexes¶

Avoid Indexes When

Column has high cardinality (many unique values)
Queries never include the partition key
Column is frequently updated
Table has very large partitions

Alternative Approaches¶

Scenario	Alternative to Secondary Index
High-cardinality lookups	Create a lookup table with the column as partition key
Complex queries	Use materialized views
Full-text search	External search engine (Elasticsearch, Solr)
Range queries on multiple columns	Denormalized tables

Materialized Views - Alternative to indexes for query patterns
Data Modeling - Query-first design
SAI Documentation - Storage-Attached Index details

Index Commands¶

Behavioral Guarantees¶

What Index Operations Guarantee¶

What Index Operations Do NOT Guarantee¶

Index Build Contract¶

Query Execution Contract¶

Failure Semantics¶

Version-Specific Behavior¶

Index Architecture¶

How Secondary Indexes Work¶

Index Types¶

CREATE INDEX¶

Synopsis¶

Description¶

Parameters¶

index_name¶

IF NOT EXISTS¶

index_target¶

Column Index¶

Collection Indexes¶

USING¶

Legacy Secondary Index (Default)¶

Storage-Attached Index (SAI)¶

SASI Index¶

WITH OPTIONS¶

SAI Options¶

SASI Options¶

Examples¶

Basic Indexes¶

SAI Indexes (Cassandra 5.0+)¶

Collection Indexes¶

Querying Indexed Columns¶

Restrictions¶

Notes¶

DROP INDEX¶

Synopsis¶

Description¶

Parameters¶

IF EXISTS¶

index_name¶

Examples¶

Finding Index Names¶

Restrictions¶

Notes¶

Index Selection Guidelines¶

When to Use Secondary Indexes¶

When to Avoid Secondary Indexes¶

Alternative Approaches¶

Related Documentation¶