Materialized Views: Distributed Coordination¶

Materialized Views (MVs) attempt to automate denormalization across a distributed cluster. This page examines the distributed coordination mechanisms, their inherent challenges, and the consistency issues that have made MVs controversial in production environments.

Historical Context¶

The Problem MVs Aimed to Solve¶

Cassandra's query-first data modeling requires denormalization—storing the same data in multiple tables with different primary keys to support different query patterns:

-- Base table: query by user_id
CREATE TABLE users (
    user_id uuid PRIMARY KEY,
    email text,
    city text
);

-- Denormalized table: query by email
CREATE TABLE users_by_email (
    email text PRIMARY KEY,
    user_id uuid,
    city text
);

-- Application must maintain both tables
INSERT INTO users (user_id, email, city) VALUES (...);
INSERT INTO users_by_email (email, user_id, city) VALUES (...);

This approach has challenges:

Application complexity to maintain consistency
Partial failure handling
No atomic guarantee across tables

The MV Solution (Cassandra 3.0, 2015)¶

Materialized Views were introduced to automate this denormalization:

CREATE MATERIALIZED VIEW users_by_email AS
    SELECT * FROM users
    WHERE email IS NOT NULL AND user_id IS NOT NULL
    PRIMARY KEY (email, user_id);

-- Single write, automatic propagation
INSERT INTO users (user_id, email, city) VALUES (...);
-- users_by_email automatically updated

The promise: write once, query many ways, with Cassandra handling synchronization.

Development Timeline¶

Date	Event	Significance
2014	CASSANDRA-6477	Initial MV proposal
2015	Cassandra 3.0	MV implementation merged
2016-2017	Production issues	Multiple consistency bugs discovered
2017	CASSANDRA-13883	Major view inconsistency issues documented
2018-2020	Ongoing fixes	Numerous patches for edge cases
2021+	Cautionary status	Community recommends careful evaluation

Distributed Coordination Architecture¶

The Core Challenge¶

MVs face a fundamental distributed systems challenge: a single write to the base table must atomically update views that may reside on completely different nodes.

Write Path Coordination¶

When a write arrives, the MV coordination involves multiple asynchronous steps:

Critical observation: The client receives acknowledgment after the base table write succeeds, but before view updates are guaranteed complete.

The Batchlog Mechanism¶

To provide some durability guarantee for view updates, Cassandra uses a local batchlog:

View Update with Batchlog:

1. Base table mutation applied
2. View mutations written to local batchlog (commitlog-backed)
3. View mutations sent to view replicas
4. On success: batchlog entry cleared
5. On failure: batchlog replayed later

Batchlog replay:
- Runs periodically (every 60 seconds by default)
- Resends pending view mutations
- Eventually clears after success or expiry

Batchlog Parameter	Default	Description
`batchlog_replay_throttle_in_kb`	1024	Replay throughput limit
View mutation timeout	1 hour	When batchlog entries expire

Batchlog Limitations

The batchlog provides best-effort delivery, not guaranteed consistency:

If a node crashes before batchlog write, view update is lost
Expired batchlog entries are discarded, not retried
No mechanism to detect permanently lost view updates

Read-Before-Write Requirement¶

Why Updates Require Reading¶

For INSERT operations, Cassandra can directly compute the view mutation. For UPDATE and DELETE, Cassandra must first read the current row to determine which view rows to modify:

-- Base table
CREATE TABLE users (
    user_id uuid PRIMARY KEY,
    email text,
    city text
);

-- View partitioned by email
CREATE MATERIALIZED VIEW users_by_email AS ...
    PRIMARY KEY (email, user_id);

-- Scenario: Update email
UPDATE users SET email = '[email protected]' WHERE user_id = X;

-- Cassandra must:
-- 1. Read current row to find old email ('[email protected]')
-- 2. DELETE from users_by_email WHERE email = '[email protected]'
-- 3. INSERT into users_by_email WHERE email = '[email protected]'

Performance Impact¶

Operation	Without MV	With MV
INSERT	Write only	Write + view writes
UPDATE	Write only	Read + Write + delete old view + insert new view
DELETE	Write only	Read + Write + delete view rows

The read-before-write pattern introduces:

Additional read latency on updates/deletes
Potential for read timeouts affecting writes
Increased load on base table replicas

Consistency Problems¶

Fundamental Consistency Gap¶

The MV architecture creates an inherent consistency window:

Timeline of a write:

T0: Client sends INSERT to base table
T1: Base table replicas receive and apply mutation
T2: Client receives SUCCESS acknowledgment  ←── Client thinks write is complete
T3: View mutations sent to view replicas (async)
T4: View replicas receive mutations
T5: View replicas apply mutations  ←── View now consistent

Gap: Between T2 and T5, base and view are inconsistent

This gap is not just theoretical—it manifests in production as:

Read-your-writes violations: Write to base, immediately query view, get stale data
View lag: View falls behind base during high write throughput
Permanent divergence: In failure scenarios, view may never receive updates

Known Consistency Issues¶

Issue 1: Lost View Updates (CASSANDRA-13883)¶

Under specific failure scenarios, view updates can be permanently lost:

Failure scenario:

1. Write arrives at coordinator
2. Base table mutation succeeds on RF replicas
3. Client receives acknowledgment
4. Coordinator dies before writing to batchlog
5. View update never sent
6. Base and view permanently inconsistent

No Self-Healing

Unlike base table inconsistencies which anti-entropy repair can detect, lost view updates cannot be automatically detected or repaired without full base table scans.

Issue 2: Read-Before-Write Race Conditions¶

Concurrent updates can cause incorrect view state:

Race condition scenario:

Thread 1: UPDATE users SET email='B' WHERE user_id=X  (was 'A')
Thread 2: UPDATE users SET email='C' WHERE user_id=X  (was 'A')

Interleaving:
T1: Thread 1 reads current email = 'A'
T2: Thread 2 reads current email = 'A'
T3: Thread 1 applies update, computes: DELETE 'A', INSERT 'B'
T4: Thread 2 applies update, computes: DELETE 'A', INSERT 'C'
T5: View receives: DELETE 'A', INSERT 'B', DELETE 'A', INSERT 'C'

Result: View contains both 'B' and 'C' entries pointing to user X
        (should only contain 'C')

Issue 3: Timestamp Ordering Violations¶

MV mutations may arrive out of order at view replicas:

Timestamp ordering issue:

Base table:
T100: INSERT user_id=X, email='A'
T200: UPDATE email='B' WHERE user_id=X
T300: DELETE WHERE user_id=X

View receives (out of order due to network):
T300: DELETE email='B'  (fails - row doesn't exist yet)
T100: INSERT email='A'
T200: DELETE email='A', INSERT email='B'

Final view state: Contains email='B' → user_id=X
Expected state: Empty (row was deleted)

Repair Complexity¶

Repairing MV inconsistencies requires special handling:

# Standard repair does NOT fix MV inconsistencies
nodetool repair keyspace table

# View repair only reconciles view replicas with each other
# Does NOT reconcile view with base table
nodetool repair keyspace view_name

# Full reconciliation requires:
# 1. Repair base table
# 2. Rebuild view (drop and recreate)
# Or use custom tooling to compare base vs view

No Built-in Consistency Check

Cassandra provides no built-in tool to detect base table / view divergence. Operators must implement custom validation or accept eventual consistency.

Why MVs Are Considered Problematic¶

Community Sentiment¶

The Cassandra community has developed cautionary guidance around MVs:

Concern	Details
Consistency bugs	Numerous JIRA issues, some unfixable without redesign
Operational complexity	Repair, monitoring, and debugging more difficult
Performance unpredictability	Read-before-write latency varies widely
Hidden costs	Write amplification not visible to applications
Limited testing	Edge cases poorly understood until production incidents

Architectural Limitations¶

The MV design has fundamental constraints that cannot be fixed without major redesign:

Asynchronous by design: View consistency requires blocking writes on view updates, which defeats the purpose
Different replica sets: Base and view partitions map to different nodes, requiring coordination
No transaction support: No mechanism for atomic base + view updates
Batchlog is best-effort: Provides durability, not consistency
Timestamp-based resolution: Works for base table, breaks for derived view state

Production Failure Modes¶

Real-world MV issues observed in production:

Failure Mode	Symptom	Impact
View lag	Queries return stale data	User-visible inconsistency
Missing rows	View lacks rows present in base	Silent data loss from view perspective
Orphan rows	View has rows not in base	Phantom results
Duplicate rows	Same base row appears multiple times	Incorrect query results
Repair failures	nodetool repair errors on views	Operational burden

Alternatives¶

Application-Managed Denormalization¶

Write to multiple tables with logged batches:

BEGIN BATCH
    INSERT INTO users (user_id, email, city) VALUES (...);
    INSERT INTO users_by_email (email, user_id, city) VALUES (...);
APPLY BATCH;

Aspect	MV	Logged Batch
Atomicity	No	Yes (within batch)
Consistency	Eventual, potentially divergent	Atomic visibility
Write latency	Lower (async)	Higher (sync)
Application code	Simpler	More complex
Failure recovery	Batchlog (unreliable)	Batchlog (reliable)

SAI (Cassandra 5.0+)¶

For many MV use cases, SAI provides a better alternative:

-- Instead of MV for email lookup
CREATE INDEX ON users (email) USING 'sai';

-- Query directly
SELECT * FROM users WHERE email = '[email protected]';

SAI advantages:

No write amplification
No separate consistency domain
Standard repair works
Simpler operations

Change Data Capture (CDC)¶

For complex derived data needs:

Enable CDC on base table
External process reads CDC log
External process maintains derived views/tables

Provides full control over consistency and transformation logic.

Recommendations¶

When MVs Might Be Acceptable¶

Scenario	Rationale
Development/prototyping	Faster iteration, consistency less critical
Read-heavy, write-light workloads	Fewer opportunities for inconsistency
Tolerance for eventual consistency	Application handles stale reads
Simple, infrequent updates	Less read-before-write overhead

When to Avoid MVs¶

Scenario	Alternative
High write throughput	Application-managed tables
Strong consistency required	Logged batches
Query flexibility	SAI (Cassandra 5.0+)
Complex transformations	CDC + external processing
Mission-critical data	Anything but MVs

If Using MVs¶

Limit to 1-2 views per table: Minimize write amplification
Monitor closely: Track view lag and inconsistencies
Implement consistency checks: Periodic base vs view validation
Plan for rebuild: Know how to drop and recreate views
Test failure scenarios: Understand behavior during node failures
Consider SAI first: Especially for Cassandra 5.0+

Materialized Views (Storage Engine) - Syntax and usage
Consistency - Consistency levels and guarantees
SAI - Recommended alternative
Replica Synchronization - Anti-entropy repair