Materialized Views: Distributed Coordination¶
Materialized Views (MVs) attempt to automate denormalization across a distributed cluster. This page examines the distributed coordination mechanisms, their inherent challenges, and the consistency issues that have made MVs controversial in production environments.
Historical Context¶
The Problem MVs Aimed to Solve¶
Cassandra's query-first data modeling requires denormalization—storing the same data in multiple tables with different primary keys to support different query patterns:
-- Base table: query by user_id
CREATE TABLE users (
user_id uuid PRIMARY KEY,
email text,
city text
);
-- Denormalized table: query by email
CREATE TABLE users_by_email (
email text PRIMARY KEY,
user_id uuid,
city text
);
-- Application must maintain both tables
INSERT INTO users (user_id, email, city) VALUES (...);
INSERT INTO users_by_email (email, user_id, city) VALUES (...);
This approach has challenges:
- Application complexity to maintain consistency
- Partial failure handling
- No atomic guarantee across tables
The MV Solution (Cassandra 3.0, 2015)¶
Materialized Views were introduced to automate this denormalization:
CREATE MATERIALIZED VIEW users_by_email AS
SELECT * FROM users
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);
-- Single write, automatic propagation
INSERT INTO users (user_id, email, city) VALUES (...);
-- users_by_email automatically updated
The promise: write once, query many ways, with Cassandra handling synchronization.
Development Timeline¶
| Date | Event | Significance |
|---|---|---|
| 2014 | CASSANDRA-6477 | Initial MV proposal |
| 2015 | Cassandra 3.0 | MV implementation merged |
| 2016-2017 | Production issues | Multiple consistency bugs discovered |
| 2017 | CASSANDRA-13883 | Major view inconsistency issues documented |
| 2018-2020 | Ongoing fixes | Numerous patches for edge cases |
| 2021+ | Cautionary status | Community recommends careful evaluation |
Distributed Coordination Architecture¶
The Core Challenge¶
MVs face a fundamental distributed systems challenge: a single write to the base table must atomically update views that may reside on completely different nodes.
Write Path Coordination¶
When a write arrives, the MV coordination involves multiple asynchronous steps:
Critical observation: The client receives acknowledgment after the base table write succeeds, but before view updates are guaranteed complete.
The Batchlog Mechanism¶
To provide some durability guarantee for view updates, Cassandra uses a local batchlog:
View Update with Batchlog:
1. Base table mutation applied
2. View mutations written to local batchlog (commitlog-backed)
3. View mutations sent to view replicas
4. On success: batchlog entry cleared
5. On failure: batchlog replayed later
Batchlog replay:
- Runs periodically (every 60 seconds by default)
- Resends pending view mutations
- Eventually clears after success or expiry
| Batchlog Parameter | Default | Description |
|---|---|---|
batchlog_replay_throttle_in_kb |
1024 | Replay throughput limit |
| View mutation timeout | 1 hour | When batchlog entries expire |
Batchlog Limitations
The batchlog provides best-effort delivery, not guaranteed consistency:
- If a node crashes before batchlog write, view update is lost
- Expired batchlog entries are discarded, not retried
- No mechanism to detect permanently lost view updates
Read-Before-Write Requirement¶
Why Updates Require Reading¶
For INSERT operations, Cassandra can directly compute the view mutation. For UPDATE and DELETE, Cassandra must first read the current row to determine which view rows to modify:
-- Base table
CREATE TABLE users (
user_id uuid PRIMARY KEY,
email text,
city text
);
-- View partitioned by email
CREATE MATERIALIZED VIEW users_by_email AS ...
PRIMARY KEY (email, user_id);
-- Scenario: Update email
UPDATE users SET email = '[email protected]' WHERE user_id = X;
-- Cassandra must:
-- 1. Read current row to find old email ('[email protected]')
-- 2. DELETE from users_by_email WHERE email = '[email protected]'
-- 3. INSERT into users_by_email WHERE email = '[email protected]'
Performance Impact¶
| Operation | Without MV | With MV |
|---|---|---|
| INSERT | Write only | Write + view writes |
| UPDATE | Write only | Read + Write + delete old view + insert new view |
| DELETE | Write only | Read + Write + delete view rows |
The read-before-write pattern introduces:
- Additional read latency on updates/deletes
- Potential for read timeouts affecting writes
- Increased load on base table replicas
Consistency Problems¶
Fundamental Consistency Gap¶
The MV architecture creates an inherent consistency window:
Timeline of a write:
T0: Client sends INSERT to base table
T1: Base table replicas receive and apply mutation
T2: Client receives SUCCESS acknowledgment ←── Client thinks write is complete
T3: View mutations sent to view replicas (async)
T4: View replicas receive mutations
T5: View replicas apply mutations ←── View now consistent
Gap: Between T2 and T5, base and view are inconsistent
This gap is not just theoretical—it manifests in production as:
- Read-your-writes violations: Write to base, immediately query view, get stale data
- View lag: View falls behind base during high write throughput
- Permanent divergence: In failure scenarios, view may never receive updates
Known Consistency Issues¶
Issue 1: Lost View Updates (CASSANDRA-13883)¶
Under specific failure scenarios, view updates can be permanently lost:
Failure scenario:
1. Write arrives at coordinator
2. Base table mutation succeeds on RF replicas
3. Client receives acknowledgment
4. Coordinator dies before writing to batchlog
5. View update never sent
6. Base and view permanently inconsistent
No Self-Healing
Unlike base table inconsistencies which anti-entropy repair can detect, lost view updates cannot be automatically detected or repaired without full base table scans.
Issue 2: Read-Before-Write Race Conditions¶
Concurrent updates can cause incorrect view state:
Race condition scenario:
Thread 1: UPDATE users SET email='B' WHERE user_id=X (was 'A')
Thread 2: UPDATE users SET email='C' WHERE user_id=X (was 'A')
Interleaving:
T1: Thread 1 reads current email = 'A'
T2: Thread 2 reads current email = 'A'
T3: Thread 1 applies update, computes: DELETE 'A', INSERT 'B'
T4: Thread 2 applies update, computes: DELETE 'A', INSERT 'C'
T5: View receives: DELETE 'A', INSERT 'B', DELETE 'A', INSERT 'C'
Result: View contains both 'B' and 'C' entries pointing to user X
(should only contain 'C')
Issue 3: Timestamp Ordering Violations¶
MV mutations may arrive out of order at view replicas:
Timestamp ordering issue:
Base table:
T100: INSERT user_id=X, email='A'
T200: UPDATE email='B' WHERE user_id=X
T300: DELETE WHERE user_id=X
View receives (out of order due to network):
T300: DELETE email='B' (fails - row doesn't exist yet)
T100: INSERT email='A'
T200: DELETE email='A', INSERT email='B'
Final view state: Contains email='B' → user_id=X
Expected state: Empty (row was deleted)
Repair Complexity¶
Repairing MV inconsistencies requires special handling:
# Standard repair does NOT fix MV inconsistencies
nodetool repair keyspace table
# View repair only reconciles view replicas with each other
# Does NOT reconcile view with base table
nodetool repair keyspace view_name
# Full reconciliation requires:
# 1. Repair base table
# 2. Rebuild view (drop and recreate)
# Or use custom tooling to compare base vs view
No Built-in Consistency Check
Cassandra provides no built-in tool to detect base table / view divergence. Operators must implement custom validation or accept eventual consistency.
Why MVs Are Considered Problematic¶
Community Sentiment¶
The Cassandra community has developed cautionary guidance around MVs:
| Concern | Details |
|---|---|
| Consistency bugs | Numerous JIRA issues, some unfixable without redesign |
| Operational complexity | Repair, monitoring, and debugging more difficult |
| Performance unpredictability | Read-before-write latency varies widely |
| Hidden costs | Write amplification not visible to applications |
| Limited testing | Edge cases poorly understood until production incidents |
Architectural Limitations¶
The MV design has fundamental constraints that cannot be fixed without major redesign:
- Asynchronous by design: View consistency requires blocking writes on view updates, which defeats the purpose
- Different replica sets: Base and view partitions map to different nodes, requiring coordination
- No transaction support: No mechanism for atomic base + view updates
- Batchlog is best-effort: Provides durability, not consistency
- Timestamp-based resolution: Works for base table, breaks for derived view state
Production Failure Modes¶
Real-world MV issues observed in production:
| Failure Mode | Symptom | Impact |
|---|---|---|
| View lag | Queries return stale data | User-visible inconsistency |
| Missing rows | View lacks rows present in base | Silent data loss from view perspective |
| Orphan rows | View has rows not in base | Phantom results |
| Duplicate rows | Same base row appears multiple times | Incorrect query results |
| Repair failures | nodetool repair errors on views | Operational burden |
Alternatives¶
Application-Managed Denormalization¶
Write to multiple tables with logged batches:
BEGIN BATCH
INSERT INTO users (user_id, email, city) VALUES (...);
INSERT INTO users_by_email (email, user_id, city) VALUES (...);
APPLY BATCH;
| Aspect | MV | Logged Batch |
|---|---|---|
| Atomicity | No | Yes (within batch) |
| Consistency | Eventual, potentially divergent | Atomic visibility |
| Write latency | Lower (async) | Higher (sync) |
| Application code | Simpler | More complex |
| Failure recovery | Batchlog (unreliable) | Batchlog (reliable) |
SAI (Cassandra 5.0+)¶
For many MV use cases, SAI provides a better alternative:
-- Instead of MV for email lookup
CREATE INDEX ON users (email) USING 'sai';
-- Query directly
SELECT * FROM users WHERE email = '[email protected]';
SAI advantages:
- No write amplification
- No separate consistency domain
- Standard repair works
- Simpler operations
Change Data Capture (CDC)¶
For complex derived data needs:
- Enable CDC on base table
- External process reads CDC log
- External process maintains derived views/tables
Provides full control over consistency and transformation logic.
Recommendations¶
When MVs Might Be Acceptable¶
| Scenario | Rationale |
|---|---|
| Development/prototyping | Faster iteration, consistency less critical |
| Read-heavy, write-light workloads | Fewer opportunities for inconsistency |
| Tolerance for eventual consistency | Application handles stale reads |
| Simple, infrequent updates | Less read-before-write overhead |
When to Avoid MVs¶
| Scenario | Alternative |
|---|---|
| High write throughput | Application-managed tables |
| Strong consistency required | Logged batches |
| Query flexibility | SAI (Cassandra 5.0+) |
| Complex transformations | CDC + external processing |
| Mission-critical data | Anything but MVs |
If Using MVs¶
- Limit to 1-2 views per table: Minimize write amplification
- Monitor closely: Track view lag and inconsistencies
- Implement consistency checks: Periodic base vs view validation
- Plan for rebuild: Know how to drop and recreate views
- Test failure scenarios: Understand behavior during node failures
- Consider SAI first: Especially for Cassandra 5.0+
Related Documentation¶
- Materialized Views (Storage Engine) - Syntax and usage
- Consistency - Consistency levels and guarantees
- SAI - Recommended alternative
- Replica Synchronization - Anti-entropy repair