Consistency¶

Consistency in Cassandra is tunable—the number of replicas that must acknowledge reads and writes is configurable per operation. This flexibility allows trading consistency for availability and latency.

Consistency Is Per-Request¶

Unlike traditional databases where consistency is a system property, Cassandra allows specifying consistency per statement:

-- Strong consistency for this critical write
CONSISTENCY QUORUM;
INSERT INTO orders (id, amount) VALUES (uuid(), 100.00);

-- Weaker consistency for this non-critical read
CONSISTENCY ONE;
SELECT * FROM page_views WHERE page_id = 'homepage';

Different operations can be optimized differently within the same application.

The Coordinator's Role¶

Every client request goes to a coordinator node, which manages the consistency guarantee:

What Acknowledgment Means¶

For writes, a replica acknowledges after:

Writing to commit log (durability)
Writing to memtable (memory)

The data is not necessarily flushed to SSTable yet, but it is durable because of the commit log.

For reads, a replica acknowledges by returning its data.

Write Consistency Levels¶

ANY: Maximum Availability¶

Data Loss Risk

If the coordinator crashes before delivering the hint, the data is permanently lost. Use ANY only for truly non-critical data.

When to use: Almost never. Only for truly non-critical data where losing some writes is acceptable.

ONE: Single Replica¶

RF = 3, ONE requires 1 ACK. Other replicas receive the write asynchronously.

When to use:

High-throughput writes where some inconsistency is acceptable
Time-series data with many writes per second
When combined with ALL reads (R + W > N)

QUORUM: Majority of All Replicas¶

RF	QUORUM	Formula
3	2	floor(3/2) + 1
5	3	floor(5/2) + 1
7	4	floor(7/2) + 1

Why majority matters:

With RF=3, QUORUM=2: Write to {A, B}, Read from {B, C}. The overlap (B) guarantees the read sees the write.

Multi-datacenter QUORUM:

Total RF = 6, QUORUM = 4. Must wait for cross-DC network round trip (50-200ms).

When to use:

Single datacenter deployments
When global consistency across all DCs is required

LOCAL_QUORUM: Majority in Local Datacenter¶

2 ACKs from DC1 = SUCCESS. Data is still sent to DC2, but coordinator does not wait.

Latency Advantage

Consistency	Path	Typical Latency
QUORUM (multi-DC)	Client → DC1 → DC2 → DC1 → Client	50-200ms
LOCAL_QUORUM	Client → DC1 → Client	1-5ms

LOCAL_QUORUM is 10-100x faster for multi-DC deployments.

When to use: Multi-DC deployments (almost always the right choice).

EACH_QUORUM: Quorum in Every Datacenter¶

Must wait for the slowest DC to achieve quorum.

When to use:

Regulatory requirements for cross-DC consistency
Financial transactions that must be in all regions before acknowledgment
Rare—most applications do not need this

ALL: Every Replica¶

Availability Risk

If any replica is down, the write fails. Single node failure causes complete write unavailability.

When to use: Almost never for writes. Single node failure causes all writes to fail.

Read Consistency Levels¶

ONE: Fastest Reads¶

If N1 has stale data, stale data is returned. Background read repair may fix inconsistency later.

QUORUM: Strong Consistency Reads¶

Coordinator returns the newest data based on timestamp. If replicas disagree, coordinator triggers read repair.

LOCAL_QUORUM: Fast Strong Reads¶

Same logic as QUORUM, but only contacts local DC replicas. Provides strong consistency within the datacenter with lower latency.

EACH_QUORUM: Not Supported for Reads¶

Note

EACH_QUORUM is write-only. For reads, use QUORUM (for cross-DC consistency) or LOCAL_QUORUM (for local consistency).

ALL: Read From Every Replica¶

Warning

If any replica is down or slow, the read times out.

When ALL reads make sense:

Combined with ONE writes (R + W > N)
Read-heavy workload where fast writes are desired
Single node failure causes all reads to fail

The Strong Consistency Formula¶

R + W > N¶

The fundamental rule for strong consistency:

R = Number of replicas read
W = Number of replicas written
N = Replication factor

If R + W > N, reads will see the latest writes.

Why it works:

N = 3 (three replicas total)
W = 2 (write to two replicas)
R = 2 (read from two replicas)

R + W = 4 > 3 = N

The sets of written replicas and read replicas MUST overlap.

Write went to: {A, B}
Read contacts: {B, C}
Overlap: B ← Has the write

Common Combinations¶

Combination	R + W	Strong?	Use Case
W=QUORUM, R=QUORUM	4 > 3	Yes	Standard strong consistency
W=ONE, R=ALL	4 > 3	Yes	Write-heavy, few reads
W=ALL, R=ONE	4 > 3	Yes	Read-heavy, few writes
W=LOCAL_QUORUM, R=LOCAL_QUORUM	4 > 3 per DC	Yes (per DC)	Multi-DC standard
W=ONE, R=ONE	2 < 3	No	High throughput, eventual
W=ONE, R=QUORUM	3 = 3	No*	Sometimes inconsistent

*R + W = N is not sufficient; must be strictly greater than.

Multi-DC Complexity¶

Configuration	QUORUM	LOCAL_QUORUM
DC1: RF=3, DC2: RF=3	4 (global)	2 (per DC)
Write behavior	4 replicas (any DC)	2 replicas (local DC only)
Read behavior	4 replicas (any DC)	2 replicas (local DC only)
Consistency	Global	Per-DC (cross-DC eventual)

For most applications, LOCAL_QUORUM is sufficient because cross-DC replication happens in milliseconds.

Failure Scenarios¶

Single Node Failure (RF=3)¶

CL	Works?	Reason
ONE	✓	2 replicas available
QUORUM (2)	✓	2 of 3 available
ALL	✗	C is down

Two Node Failure (RF=3)¶

CL	Works?	Reason
ONE	✓	1 replica available
QUORUM (2)	✗	Only 1 of 3 available
ALL	✗	B and C down

Entire Datacenter Failure¶

CL	Works?	Reason
LOCAL_ONE	✓	Only needs local DC
LOCAL_QUORUM	✓	Only needs local DC
QUORUM (4/6)	✗	Only 3 of 6 available
EACH_QUORUM	✗	DC2 has no quorum

Key Insight

LOCAL_QUORUM survives entire DC failure while maintaining strong local consistency. This is why it is recommended for multi-DC deployments.

Lightweight Transactions (LWT)¶

When Regular Consistency Is Not Enough¶

Regular QUORUM consistency does not provide linearizability for compare-and-set operations:

Problem scenario (race condition):

Time 0: Both Client A and Client B read balance = 100
Time 1: Client A: UPDATE SET balance = 100 - 50
Time 2: Client B: UPDATE SET balance = 100 - 30

Result: balance = 70 (should be 20!)
Both clients read 100, both subtracted from 100.

LWT Solves This With Paxos¶

Lightweight transactions use the Paxos consensus algorithm (Lamport, L., 1998, "The Part-Time Parliament") to achieve linearizable consistency for compare-and-set operations.

-- Compare-and-set with IF clause triggers LWT
UPDATE account SET balance = 50 WHERE id = 1 IF balance = 100;

-- Returns [applied] = true if balance was 100
-- Returns [applied] = false if balance was different

How LWT Works¶

LWT uses four Paxos phases, requiring 4 round trips compared to 1 for regular writes:

Aspect	Regular Write	LWT Write
Round trips	1	4
Latency	~1-5ms	~4-20ms
Throughput	High	Low

Serial Consistency Levels¶

-- SERIAL: Paxos across ALL datacenters
SERIAL CONSISTENCY SERIAL;
INSERT INTO unique_emails (email, user_id)
VALUES ('[email protected]', 123) IF NOT EXISTS;

-- LOCAL_SERIAL: Paxos within local datacenter only
SERIAL CONSISTENCY LOCAL_SERIAL;
INSERT INTO user_sessions (session_id, user_id)
VALUES (uuid(), 123) IF NOT EXISTS;

Serial CL	Scope	Latency	Use Case
SERIAL	All DCs	High (cross-DC)	Global uniqueness
LOCAL_SERIAL	Local DC	Lower	DC-local uniqueness

LWT Best Practices¶

Use sparingly: LWT is 4x slower than regular writes
Batch related LWTs: Multiple LWT in same partition can share Paxos
Consider alternatives: Often application-level locking or redesign is better
Monitor contention: High LWT contention causes performance collapse

-- Good: Single LWT for compare-and-set
UPDATE inventory SET qty = qty - 1 WHERE product_id = ? IF qty > 0;

-- Bad: Using LWT unnecessarily
INSERT INTO events (id, data) VALUES (uuid(), ?) IF NOT EXISTS;
-- UUID is always unique, LWT is not needed

Speculative Execution¶

Speculative execution sends duplicate requests to reduce tail latency when one replica is slow.

Without Speculative Execution¶

When one replica is slow (e.g., due to GC pause), the client must wait for it:

With Speculative Execution¶

When a replica exceeds the expected latency threshold, the coordinator sends a speculative request to another replica:

Scenario	Latency
Without speculative execution	500ms (waiting for slow N2)
With speculative execution	15ms (N3 responds instead)

Configuration¶

-- Per-table speculative retry setting
ALTER TABLE users WITH speculative_retry = '99percentile';

-- Options:
-- 'Xpercentile': Retry after X percentile latency
-- 'Yms': Retry after Y milliseconds
-- 'ALWAYS': Always send to extra replica immediately
-- 'NONE': Disable speculative retry

Trade-off: Speculative execution increases replica load but reduces tail latency.

Debugging Consistency Issues¶

"I wrote data but cannot read it"¶

Diagnosis steps:

Check consistency levels: Write CL + Read CL > RF?
Check for clock skew: ntpq -p on each node
Check for failed writes: Application logs for timeouts
Enable tracing: TRACING ON;

Tracing¶

TRACING ON;

SELECT * FROM users WHERE user_id = 123;

-- Output shows:
-- - Which replicas were contacted
-- - Response times from each
-- - Whether read repair occurred

JMX Metrics¶

# Unavailable errors (CL could not be satisfied)
org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Unavailables
org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Unavailables

# Timeouts
org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Timeouts
org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Timeouts

# LWT metrics
org.apache.cassandra.metrics:type=ClientRequest,scope=CASWrite,name=Latency
org.apache.cassandra.metrics:type=ClientRequest,scope=CASWrite,name=ContentionHistogram

Best Practices¶

Choosing Consistency Levels¶

Scenario	Write CL	Read CL	Rationale
Single DC, strong consistency	QUORUM	QUORUM	R+W > N
Multi-DC, low latency	LOCAL_QUORUM	LOCAL_QUORUM	No cross-DC wait
Multi-DC, global consistency	QUORUM	QUORUM	Cross-DC consensus
High throughput, eventual OK	ONE	ONE	Fastest
Write-heavy	ONE	ALL	Fast writes
Read-heavy	ALL	ONE	Fast reads
Time-series metrics	LOCAL_ONE	LOCAL_ONE	Volume over consistency

Common Mistakes¶

Mistake	Consequence
Using ALL in production	Single node failure breaks everything
QUORUM in multi-DC when LOCAL_QUORUM suffices	Unnecessary latency
ONE/ONE without understanding	No consistency guarantee
Overusing LWT	Performance degradation
Ignoring clock skew	Silent data loss via timestamp conflicts

Monitoring¶

Metric	Alert Threshold	Meaning
Unavailables	>0	CL cannot be met
Timeouts	>1%	Replicas too slow
Read repair rate	>10%	High inconsistency
LWT contention	>10%	Redesign needed

Distributed Data Overview - How partitioning, replication, and consistency work together
Replication - How replicas are placed
Replica Synchronization - How replicas converge
Operations - Running repair

Consistency¶

Consistency Is Per-Request¶

The Coordinator's Role¶

What Acknowledgment Means¶

Write Consistency Levels¶

ANY: Maximum Availability¶

ONE: Single Replica¶

QUORUM: Majority of All Replicas¶

LOCAL_QUORUM: Majority in Local Datacenter¶

EACH_QUORUM: Quorum in Every Datacenter¶

ALL: Every Replica¶

Read Consistency Levels¶

ONE: Fastest Reads¶

QUORUM: Strong Consistency Reads¶

LOCAL_QUORUM: Fast Strong Reads¶

EACH_QUORUM: Not Supported for Reads¶

ALL: Read From Every Replica¶

The Strong Consistency Formula¶

R + W > N¶

Common Combinations¶

Multi-DC Complexity¶

Failure Scenarios¶

Single Node Failure (RF=3)¶

Two Node Failure (RF=3)¶

Entire Datacenter Failure¶

Lightweight Transactions (LWT)¶

When Regular Consistency Is Not Enough¶

LWT Solves This With Paxos¶

How LWT Works¶

Serial Consistency Levels¶

LWT Best Practices¶

Speculative Execution¶

Without Speculative Execution¶

With Speculative Execution¶

Configuration¶

Debugging Consistency Issues¶

"I wrote data but cannot read it"¶

Tracing¶

JMX Metrics¶

Best Practices¶

Choosing Consistency Levels¶

Common Mistakes¶

Monitoring¶

Related Documentation¶