DDL Commands¶

Data Definition Language (DDL) commands manage schema objects in Apache Cassandra. This section provides comprehensive reference documentation for all DDL statements.

Overview¶

DDL commands create, modify, and remove schema objects: keyspaces, tables, indexes, materialized views, user-defined types, functions, and aggregates. Unlike traditional relational databases where schema changes may require table locks or data migration, Cassandra applies schema modifications as metadata operations that propagate cluster-wide through the gossip protocol.

Schema Architecture¶

Distributed Schema Management¶

Cassandra maintains schema information in the system_schema keyspace. Unlike user keyspaces where replication factor determines how many nodes store data, every node in the cluster stores a complete copy of the schema. This ensures all nodes can independently validate queries and understand the data model without contacting other nodes.

The system_schema keyspace uses LocalStrategy for replication, meaning each node manages its own local copy. Schema synchronization occurs through the gossip protocol rather than normal read/write replication:

DESCRIBE KEYSPACE system_schema;

CREATE KEYSPACE system_schema
    WITH REPLICATION = { 'class': 'LocalStrategy' }
    AND DURABLE_WRITES = true;

Client → Coordinator → Local Schema Update → Gossip Propagation → All Nodes
                              │                                        │
                              └── system_schema tables updated         └── Every node receives
                                                                           complete schema

When a DDL statement executes:

The coordinator node validates the DDL statement
Schema metadata is written to local system_schema tables
A schema mutation is broadcast via gossip to all nodes
Each node applies the schema change to its local system_schema independently
Schema agreement is reached when all nodes have matching schema versions

Schema vs Data Replication

Schema replication is independent of keyspace replication settings:

Schema: Always stored on every node (via gossip)
Data: Stored on nodes determined by partition key and replication factor

A keyspace with replication_factor: 3 stores data on 3 nodes, but the schema definition for that keyspace exists on all nodes in the cluster.

Schema Agreement¶

Before returning success for a DDL operation, Cassandra waits for schema agreement—a state where all live nodes have the same schema version. The schema version is a UUID computed from the hash of all schema metadata.

-- Check current schema versions across the cluster
SELECT schema_version FROM system.local;
SELECT peer, schema_version FROM system.peers;

-- Using nodetool
-- nodetool describecluster | grep -A 10 "Schema versions"

Schema Disagreement

If schema agreement cannot be reached within the timeout (default 10 seconds), the DDL statement returns a warning but the change may still propagate. Operations during schema disagreement may produce unpredictable results. Monitor nodetool describecluster for schema version mismatches.

Schema Disagreement Consequences¶

Queries May Fail or Return Inconsistent Results

When nodes have different schema versions, queries can behave unpredictably:

Scenario	Potential Outcome
Query column not yet on all nodes	`InvalidRequestException` on some nodes
Query new table before propagation	`TableNotFoundException` on some nodes
Query with different column types	Data corruption or read failures
Concurrent DDL operations	Conflicting schemas across cluster

Real-world failure scenarios:

# DDL executed but not propagated
CREATE TABLE new_table (...);

# Immediate query may fail
SELECT * FROM new_table;
-- Node A: Success
-- Node B: UnconfiguredTableException: new_table

Waiting for Schema Agreement¶

After DDL operations, ensure schema agreement before executing DML:

-- Configuration in cassandra.yaml
max_schema_agreement_wait_seconds: 10  -- Default wait time

-- Programmatic check (driver)
-- Most drivers provide schema agreement wait methods

Best practices:

Wait for agreement: Use driver's schema agreement API after DDL
Avoid concurrent DDL: Execute schema changes sequentially
Monitor disagreement: Alert on prolonged schema version mismatches
Rolling restarts: Wait for schema agreement between node restarts
Repair after issues: Run nodetool repair -pr on nodes with stale schema

# Check for schema disagreement
nodetool describecluster

# Output showing disagreement:
# Schema versions:
#     a1b2c3d4-... : [10.0.0.1, 10.0.0.2, 10.0.0.3]
#     e5f6g7h8-... : [10.0.0.4]  # This node has different schema!

Handling Schema Timeouts¶

Timeout Scenario	Client Action
DDL returns timeout warning	Check schema versions, retry if needed
DDL fails completely	Verify schema state, rerun DDL
Prolonged disagreement (> 30s)	Investigate node health, repair or restart
Permanent disagreement	Restart affected nodes with correct schema

System Schema Tables¶

Schema metadata is stored in the system_schema keyspace:

Table	Contents
`keyspaces`	Keyspace definitions and replication settings
`tables`	Table schemas, options, and flags
`columns`	Column definitions for each table
`types`	User-defined type definitions
`functions`	User-defined function code and signatures
`aggregates`	User-defined aggregate definitions
`indexes`	Secondary index metadata
`views`	Materialized view definitions
`triggers`	Trigger configurations

Online Schema Changes¶

Cassandra supports online schema modifications without blocking reads or writes. This capability stems from the storage engine architecture:

How Online Changes Work¶

Adding Columns

New columns are metadata-only changes. Existing SSTables are not modified. When reading rows written before the column was added, Cassandra returns null for the new column.

Dropping Columns

Dropped columns are marked in metadata but data remains in SSTables until compaction. Reading dropped column data is prevented at the storage layer.

Altering Types

Type changes for non-primary-key columns are permitted if the new type is compatible (e.g., widening INT to BIGINT). The storage engine interprets existing bytes according to the new type.

Primary Key Immutability

Primary key columns (partition key and clustering columns) cannot be modified after table creation. The primary key structure determines data distribution and physical storage layout.

Schema Change Propagation Time¶

Schema changes propagate at gossip speed, typically completing cluster-wide within seconds. Factors affecting propagation:

Cluster size: Larger clusters require more gossip rounds
Network latency: Cross-datacenter propagation adds delay
Node health: Unresponsive nodes delay agreement

Typical propagation times:
  - 3-node cluster: < 1 second
  - 50-node cluster: 1-3 seconds
  - 200-node multi-DC cluster: 3-10 seconds

Schema Versioning¶

Each schema modification increments the schema version. Cassandra tracks schema history but does not provide built-in rollback capabilities.

Schema Version Control

Maintain DDL scripts in version control. Use migration tools like cassandra-migration or application-level schema management to track and apply schema changes systematically.

Snapshot Before Schema Changes¶

Before significant schema modifications, create snapshots:

# Snapshot specific table before altering
nodetool snapshot -t pre_alter_backup keyspace_name table_name

# Snapshot entire keyspace
nodetool snapshot -t pre_migration keyspace_name

Schema Objects¶

Hierarchy¶

Cluster
└── Keyspace (namespace + replication configuration)
    ├── Table (column families)
    │   ├── Column definitions
    │   ├── Primary key structure
    │   ├── Indexes
    │   ├── Materialized views
    │   └── Triggers
    ├── User-Defined Types
    ├── Functions
    └── Aggregates

Naming Rules¶

Identifier Syntax¶

All schema object names follow these rules:

Rule	Unquoted Identifiers	Quoted Identifiers
First character	Letter (a-z, A-Z)	Any character
Subsequent characters	Letters, digits (0-9), underscore (_)	Any character
Case sensitivity	Case-insensitive (stored as lowercase)	Case-sensitive (preserved exactly)
Reserved words	Not allowed	Allowed
Maximum length	48 characters	48 characters

-- Unquoted: stored as lowercase
CREATE TABLE UserEvents (...);   -- Stored as 'userevents'
CREATE TABLE user_events (...);  -- Stored as 'user_events'

-- Quoted: case and special characters preserved
CREATE TABLE "UserEvents" (...); -- Stored as 'UserEvents'
CREATE TABLE "user-events" (...); -- Stored as 'user-events'
CREATE TABLE "select" (...);     -- Reserved word allowed when quoted

Maximum Lengths by Object Type¶

Object Type	Maximum Length	Notes
Keyspace name	48 characters
Table name	48 characters
Column name	65535 characters	Practical limit ~100 for readability
Index name	48 characters	Auto-generated: `table_column_idx`
Materialized view name	48 characters
User-defined type name	48 characters
UDT field name	65535 characters
Function name	48 characters
Aggregate name	48 characters
Role name	256 characters

Directory Name Limits

On disk, keyspace and table names become directory names. Some filesystems have path length limits (255 characters for most). Very long names combined with Cassandra's data directory path may exceed filesystem limits.

Permitted Characters¶

Unquoted identifiers (recommended for portability):

Letters: a-z, A-Z (ASCII only)
Digits: 0-9 (not as first character)
Underscore: _

Quoted identifiers allow additional characters:

Spaces and hyphens: "my table", "user-events"
Unicode characters: "日本語テーブル"
Special characters: "table.name", "column@v2"
Reserved words: "select", "table", "index"

-- Valid unquoted names
CREATE TABLE user_events_2024 (...);
CREATE TABLE t1 (...);

-- Invalid unquoted names (require quoting)
CREATE TABLE 2024_events (...);    -- ERROR: starts with digit
CREATE TABLE user-events (...);    -- ERROR: contains hyphen
CREATE TABLE select (...);         -- ERROR: reserved word

-- Valid quoted equivalents
CREATE TABLE "2024_events" (...);
CREATE TABLE "user-events" (...);
CREATE TABLE "select" (...);

Reserved Words¶

CQL reserves certain keywords that cannot be used as unquoted identifiers. Common reserved words include:


ADD	ALTER	AND	AS	ASC
BATCH	BEGIN	BY	COLUMN	CREATE
DELETE	DESC	DROP	EXISTS	FROM
GRANT	IF	IN	INDEX	INSERT
INTO	KEYSPACE	LIMIT	NOT	NULL
OF	ON	OR	ORDER	PRIMARY
REVOKE	ROLE	SELECT	SET	TABLE
TO	TOKEN	TRUNCATE	UPDATE	USE
USING	VALUES	WHERE	WITH

For a complete list, consult the CQL specification.

Best Practice

Use lowercase unquoted identifiers with underscores for maximum compatibility:

user_events ✓
UserEvents ✓ (stored as userevents)
"user-events" - works but requires quoting everywhere
"select" - works but confusing

-- These are equivalent
CREATE TABLE users (...);
CREATE TABLE Users (...);
CREATE TABLE USERS (...);

-- This preserves case
CREATE TABLE "UserAccounts" (...);

Command Reference¶

Keyspace Commands¶

Keyspaces define namespaces and replication configuration for tables.

Command	Description
CREATE KEYSPACE	Create a new keyspace with replication settings
ALTER KEYSPACE	Modify keyspace replication or options
DROP KEYSPACE	Remove a keyspace and all contents
USE	Set the current keyspace for the session

Table Commands¶

Tables store data as rows organized by primary key.

Command	Description
CREATE TABLE	Define a new table with columns and primary key
ALTER TABLE	Add/drop columns or modify table options
DROP TABLE	Remove a table and all its data
TRUNCATE	Remove all rows from a table

Index Commands¶

Indexes enable queries on non-primary-key columns.

Command	Description
CREATE INDEX	Create a secondary index or SAI index
DROP INDEX	Remove an index

Collection Indexing¶

Cassandra supports indexing collection types (SET, LIST, MAP) to enable CONTAINS and element-specific queries:

Collection Type	Index Target	Query Enabled
`SET<T>`	Column name	`WHERE set_col CONTAINS value`
`LIST<T>`	Column name	`WHERE list_col CONTAINS value`
`MAP<K,V>`	`KEYS(column)`	`WHERE map_col CONTAINS KEY key`
`MAP<K,V>`	`VALUES(column)`	`WHERE map_col CONTAINS value`
`MAP<K,V>`	`ENTRIES(column)`	`WHERE map_col[key] = value`
`FROZEN<collection>`	`FULL(column)`	`WHERE frozen_col = entire_value`

-- Index SET elements
CREATE INDEX ON users (tags);
SELECT * FROM users WHERE tags CONTAINS 'premium';

-- Index MAP entries for key-value lookups
CREATE INDEX ON users (ENTRIES(attributes));
SELECT * FROM users WHERE attributes['role'] = 'admin';

User-Defined Type Indexing¶

UDT indexing capabilities depend on whether the type is frozen and the index implementation:

UDT State	Index Type	Capability
`FROZEN<udt>`	2i, SAI	Index entire frozen value for equality matching
Non-frozen	SAI (5.0+)	Index individual UDT fields

Frozen UDT indexing:

CREATE TYPE address (
    street TEXT,
    city TEXT,
    zip TEXT
);

CREATE TABLE customers (
    id UUID PRIMARY KEY,
    home_address FROZEN<address>
);

-- Index entire frozen UDT
CREATE INDEX ON customers (home_address);

-- Query requires exact match of all fields
SELECT * FROM customers
WHERE home_address = {street: '123 Main St', city: 'NYC', zip: '10001'};

Non-frozen UDT field indexing (SAI, Cassandra 5.0+):

CREATE TABLE customers (
    id UUID PRIMARY KEY,
    home_address address  -- non-frozen
);

-- Index specific UDT field
CREATE CUSTOM INDEX ON customers (home_address.city)
    USING 'StorageAttachedIndex';

-- Query individual field
SELECT * FROM customers WHERE home_address.city = 'NYC';

Frozen vs Non-Frozen UDT Indexing

Frozen UDTs: Serialized as single value; only equality matching on complete UDT supported
Non-frozen UDTs: Individual fields addressable; SAI enables field-level indexing and queries
Legacy secondary indexes (2i) do not support non-frozen UDT field indexing

Materialized View Commands¶

Materialized views maintain denormalized copies of base table data.

Command	Description
CREATE MATERIALIZED VIEW	Create an auto-maintained view
ALTER MATERIALIZED VIEW	Modify view options
DROP MATERIALIZED VIEW	Remove a materialized view

Trigger Commands¶

Triggers execute custom server-side Java code on data mutations.

Command	Description
CREATE TRIGGER	Attach a trigger to a table
DROP TRIGGER	Remove a trigger from a table

User-Defined Type Commands¶

UDTs define composite types with named fields.

Command	Description
CREATE TYPE	Define a new user-defined type
ALTER TYPE	Add fields or rename existing fields
DROP TYPE	Remove a user-defined type

Function Commands¶

User-defined functions extend CQL with custom scalar operations.

Command	Description
CREATE FUNCTION	Create a user-defined scalar function
DROP FUNCTION	Remove a user-defined function

Aggregate Commands¶

User-defined aggregates process multiple rows into a single value.

Command	Description
CREATE AGGREGATE	Create a user-defined aggregate
DROP AGGREGATE	Remove a user-defined aggregate

Best Practices¶

Schema Design¶

Design Principles

Design tables for specific query patterns (query-first modeling)
Denormalize data to avoid joins
Keep partition sizes under 100MB
Limit clustering columns to support required query patterns

Schema Changes in Production¶

Production Deployments

Test schema changes in a staging environment first
Create snapshots before applying changes
Apply changes during low-traffic periods when possible
Monitor schema agreement after changes
Run repairs after replication changes

Avoiding Common Issues¶

Issue	Cause	Prevention
Schema disagreement	Network issues, slow nodes	Monitor cluster health, increase timeout if needed
Orphaned data	Dropping columns without compaction	Run `nodetool compact` after dropping columns
Replication lag	Changing RF without repair	Always run repair after replication changes
Type mismatches	Incompatible column type changes	Only widen types (INT→BIGINT), never narrow

Compatibility¶

CQL Version Features¶

Feature	Minimum CQL Version	Cassandra Version
Basic DDL	3.0	2.0+
User-Defined Types	3.0	2.1+
User-Defined Functions	3.0	2.2+
Materialized Views	3.4	3.0+
SASI Indexes	3.4	3.4+
Storage-Attached Indexes	3.4.7	5.0+
Vector Types	3.4.7	5.0+

DML Commands - Data manipulation: SELECT, INSERT, UPDATE, DELETE
Security Commands - Roles, permissions, GRANT, REVOKE
Data Types - Native types, collections, UDTs, vectors
Data Modeling - Query-first design principles
Architecture - Storage engine and distributed systems