Lightweight Transactions¶
Lightweight Transactions (LWT) provide linearizable consistency through compare-and-set operations. They use the Paxos consensus protocol to ensure that only one concurrent operation succeeds when multiple clients attempt to modify the same data.
Implementation Reference
LWT was introduced in Cassandra 2.0 (CASSANDRA-5062). The Paxos implementation is defined in StorageProxy.cas() and PaxosState.java.
Behavioral Guarantees¶
What LWT Guarantees¶
- Operations on the same partition appear to execute in some total order consistent with real-time (linearizability)
- The IF condition evaluation and subsequent mutation are atomic
- LWT guarantees apply only within a single partition
- With SERIAL consistency, all datacenters participate in consensus
- With LOCAL_SERIAL, only the local datacenter participates
What LWT Does NOT Guarantee¶
Undefined Behavior
The following behaviors are undefined and must not be relied upon:
- Cross-partition atomicity: LWT provides no guarantees across different partitions. Multi-partition batches with LWT may leave partitions in inconsistent states on partial failure.
- Ordering across partitions: Two LWT operations on different partitions have no defined ordering relationship.
- Timeout outcomes: If an LWT times out, the operation may or may not have been applied. The outcome is undefined.
- Retry safety without idempotency: Retrying a failed LWT without checking the result may cause duplicate application.
Version-Specific Behavior¶
| Version | Behavior |
|---|---|
| 2.0 - 2.1 | Initial Paxos implementation. Contention handling less efficient. |
| 3.0+ | Improved Paxos state management (CASSANDRA-9143) |
| 4.0+ | Paxos state auto-cleanup, configurable timeouts (CASSANDRA-12126) |
| 5.0+ | Accord transaction protocol available as alternative (CEP-15) |
Overview¶
The Problem LWT Solves¶
Standard Cassandra writes use last-write-wins semantics—concurrent writes to the same row can overwrite each other unpredictably:
Problem: Last-Write-Wins Race Condition
| Step | Client A | Client B |
|---|---|---|
| Initial | quantity = 1 |
quantity = 1 |
| Read | Reads quantity = 1 | Reads quantity = 1 |
| Write | UPDATE SET quantity = 0 |
UPDATE SET quantity = 0 |
| Result | quantity = 0 | quantity = 0 |
Problem: Two items were "sold" but inventory only decremented by 1!
How LWT Solves It¶
LWT ensures read-modify-write is atomic:
Historical Context¶
| Version | LWT Feature |
|---|---|
| 2.0 | Initial LWT implementation (Paxos) |
| 2.1 | Performance improvements |
| 3.0 | Batch LWT support improvements |
| 4.0 | Paxos state cleanup, better timeouts |
| 5.0 | Accord transaction protocol (future) |
Paxos Consensus¶
What Is Paxos?¶
Paxos is a distributed consensus algorithm that ensures agreement among nodes even when some nodes fail. Cassandra implements a variant called "single-decree Paxos" for LWT.
Paxos Phases¶
Why LWT Is Slow¶
| Operation | Round Trips | Typical Latency |
|---|---|---|
| Regular write | 1 | 1-5ms |
| Regular read | 1 | 1-5ms |
| LWT | 4 | 10-50ms |
Contributing factors:
- Multiple network round trips
- Contention handling (ballot conflicts)
- Serial execution per partition
Synopsis¶
INSERT IF NOT EXISTS¶
INSERT INTO *table*
( *columns* )
VALUES ( *values* )
IF NOT EXISTS
UPDATE IF / IF EXISTS¶
UPDATE *table*
SET *assignments*
WHERE *primary_key*
IF EXISTS
| IF *condition* [ AND *condition* ... ]
DELETE IF / IF EXISTS¶
DELETE FROM *table*
WHERE *primary_key*
IF EXISTS
| IF *condition* [ AND *condition* ... ]
condition:
*column_name* *operator* *value*
| *column_name* [ *index* ] *operator* *value*
| *column_name* [ *key* ] *operator* *value*
| *column_name* IN ( *values* )
operator:
= | != | < | > | <= | >= | IN
INSERT IF NOT EXISTS¶
Ensures row creation only if it doesn't exist:
INSERT INTO users (user_id, username, email)
VALUES (uuid(), 'alice', '[email protected]')
IF NOT EXISTS;
Results¶
When applied (row didn't exist):
[applied]
-----------
True
When not applied (row exists):
[applied] | user_id | username | email
-----------+--------------------------------------+----------+---------------------
False | 550e8400-e29b-41d4-a716-446655440000 | alice | [email protected]
The existing row values are returned for client decision-making.
Use Cases¶
| Use Case | Example |
|---|---|
| Unique usernames | INSERT INTO usernames (username, user_id) VALUES (?, ?) IF NOT EXISTS |
| Idempotent writes | Prevent duplicate event processing |
| Resource allocation | First-come-first-served |
UPDATE IF EXISTS¶
Updates only if row exists:
UPDATE users
SET last_login = toTimestamp(now())
WHERE user_id = ?
IF EXISTS;
Use cases:
- Don't create accidental rows
- Verify row presence before modification
- Avoid orphan data
UPDATE IF Condition¶
Conditional update based on column values:
UPDATE inventory
SET quantity = quantity - 1
WHERE product_id = 'SKU-001'
IF quantity > 0;
Multiple Conditions¶
UPDATE accounts
SET balance = balance - 100
WHERE account_id = ?
IF balance >= 100
AND status = 'active'
AND frozen = false;
Collection Conditions¶
-- Check list element
UPDATE users
SET phone_numbers = ?
WHERE user_id = ?
IF phone_numbers[0] = '+1-555-0100';
-- Check map entry
UPDATE users
SET preferences = preferences + {'theme': 'dark'}
WHERE user_id = ?
IF preferences['theme'] = 'light';
DELETE IF EXISTS / IF Condition¶
-- Delete only if exists
DELETE FROM sessions
WHERE session_id = ?
IF EXISTS;
-- Conditional delete
DELETE FROM users
WHERE user_id = ?
IF status = 'inactive'
AND last_login < '2023-01-01';
Serial Consistency Levels¶
LWT operations use special consistency levels:
SERIAL¶
Global linearizability across all datacenters:
-- All replicas participate in Paxos
CONSISTENCY SERIAL;
UPDATE accounts SET balance = 100 WHERE id = ? IF balance = 50;
Behavior:
- Paxos runs across all replicas cluster-wide
- Highest consistency guarantee
- Highest latency (cross-DC round trips)
LOCAL_SERIAL¶
Linearizability within local datacenter only:
-- Only local DC participates
CONSISTENCY LOCAL_SERIAL;
UPDATE accounts SET balance = 100 WHERE id = ? IF balance = 50;
Behavior:
- Paxos limited to local datacenter
- Lower latency than SERIAL
- Not linearizable across datacenters
Choosing Serial Consistency¶
| Scenario | Recommended |
|---|---|
| Single datacenter | Either (same behavior) |
| Multi-DC, local consistency acceptable | LOCAL_SERIAL |
| Multi-DC, global consistency required | SERIAL |
| Low latency priority | LOCAL_SERIAL |
Contention and Retries¶
Contention Handling¶
When multiple clients try LWT on the same partition:
Client-Side Retry Logic¶
// Java driver example with retry
int maxRetries = 5;
for (int i = 0; i < maxRetries; i++) {
ResultSet rs = session.execute(lwtStatement);
Row row = rs.one();
if (row.getBool("[applied]")) {
return true; // Success
}
// Read current value and decide whether to retry
int currentValue = row.getInt("quantity");
if (currentValue <= 0) {
return false; // Can't complete operation
}
// Exponential backoff
Thread.sleep((long) Math.pow(2, i) * 100);
}
throw new RuntimeException("Max retries exceeded");
CAS (Compare-And-Set) Pattern¶
-- Read current state
SELECT version, content FROM documents WHERE doc_id = ?;
-- Attempt update with version check
UPDATE documents
SET content = 'new content', version = 6
WHERE doc_id = ?
IF version = 5;
-- If [applied] = false, re-read and retry
Performance Considerations¶
When to Use LWT¶
Good LWT Use Cases
- Unique constraints: Username uniqueness, email uniqueness
- Inventory management: Prevent overselling
- Idempotent operations: Exactly-once processing
- Optimistic locking: Version-based updates
- Resource allocation: First-come-first-served
When to Avoid LWT¶
Avoid LWT For
- High-throughput operations: Consider different data model
- Non-critical uniqueness: Eventually consistent may suffice
- Counters: Use native counter columns instead
- Cross-partition atomicity: LWT is per-partition only
Performance Metrics¶
# Typical LWT latencies
single_partition_lwt: 15-30ms
contended_lwt: 50-200ms
cross_dc_lwt: 50-100ms
# Throughput impact
regular_write_throughput: 10000/s per partition
lwt_throughput: 500-1000/s per partition
Monitoring LWT¶
# Cassandra metrics
nodetool proxyhistograms # Look at CAS latencies
# CQL tracing
TRACING ON;
UPDATE users SET name = 'Alice' WHERE id = 1 IF name = 'Bob';
Batches with LWT¶
LWT can be used in batches with special semantics:
BEGIN BATCH
INSERT INTO users (user_id, username) VALUES (?, 'alice') IF NOT EXISTS;
INSERT INTO usernames (username, user_id) VALUES ('alice', ?) IF NOT EXISTS;
APPLY BATCH;
Batch LWT Behavior¶
- All conditions must be on same partition OR...
- Batch becomes a multi-partition Paxos (very expensive)
- All conditions evaluated atomically
- If ANY condition fails, entire batch fails
Multi-Partition LWT Batches
Multi-partition LWT batches require Paxos across all involved partitions, dramatically increasing latency and contention. Avoid if possible.
Failure Semantics¶
Understanding failure behavior is critical for correct LWT usage.
Failure Modes and Outcomes¶
| Failure Mode | Outcome | Client Action |
|---|---|---|
[applied] = true |
Operation succeeded | None required |
[applied] = false |
Condition not met, operation rejected | Read returned values, retry with updated condition if appropriate |
WriteTimeoutException |
Undefined - may or may not have been applied | Read to determine current state, retry if needed |
UnavailableException |
Operation not applied | Safe to retry |
ReadTimeoutException during CAS |
Undefined - Paxos read phase failed | Read to determine current state |
Timeout Handling Contract¶
Timeout Does Not Mean Failure
When an LWT operation times out:
- The operation may have been successfully applied
- The operation may have partially executed (Paxos PREPARE succeeded, PROPOSE failed)
- The outcome is undefined and must be verified by reading current state
// CORRECT: Verify state after timeout
try {
session.execute(lwtStatement);
} catch (WriteTimeoutException e) {
// Must read to determine actual state
Row current = session.execute(readStatement).one();
// Decide based on current state
}
// INCORRECT: Assume failure and retry blindly
try {
session.execute(lwtStatement);
} catch (WriteTimeoutException e) {
session.execute(lwtStatement); // May cause duplicate application
}
Idempotency Requirements¶
LWT operations should be designed for safe retry:
| Pattern | Idempotent | Notes |
|---|---|---|
INSERT ... IF NOT EXISTS |
✅ Yes | Safe to retry - second attempt returns [applied]=false |
UPDATE ... IF column = X SET column = Y |
✅ Yes | Safe to retry - condition fails after first success |
UPDATE ... SET counter = counter + 1 IF ... |
❌ No | Counter operations not supported with LWT |
DELETE ... IF EXISTS |
✅ Yes | Safe to retry - second attempt returns [applied]=false |
Consistency During Failure¶
Preserved guarantees:
- Linearizability is maintained even during failures
- No partial application visible to other transactions
- Paxos ballots ensure exactly-one-winner semantics
Not guaranteed:
- Client notification of success (timeout may occur after commit)
- Bounded latency under contention
- Progress under continuous contention (livelock possible)
Restrictions¶
Hard Constraints
The following restrictions are enforced by Cassandra and will result in errors:
Timestamps:
USING TIMESTAMPmust not be used with IF conditions—Paxos manages timestamps internally- Attempting to specify timestamp results in
InvalidRequest
Counters:
- Counter columns must not be used with IF conditions
- Use regular counter increment/decrement instead
Scope:
- LWT operates on single partition only
- Cross-partition coordination requires batch (with significant limitations)
- No cross-table LWT
Conditions:
- Conditions must only reference non-primary-key columns
- Conditions must not reference columns in SET clause (no self-reference)
- Collection element conditions require proper syntax
Examples¶
Unique Username Registration¶
-- Reserve username
INSERT INTO usernames (username, user_id, created_at)
VALUES ('desired_name', ?, toTimestamp(now()))
IF NOT EXISTS;
-- If applied, create user
-- If not applied, username taken
Inventory Decrement¶
UPDATE inventory
SET quantity = quantity - 1,
last_sale = toTimestamp(now())
WHERE product_id = 'SKU-001'
IF quantity > 0;
-- Handle result
-- [applied] = true: sale completed
-- [applied] = false: out of stock
Optimistic Locking¶
-- Attempt update
UPDATE documents
SET content = 'updated content',
version = 5,
updated_at = toTimestamp(now()),
updated_by = 'user123'
WHERE doc_id = ?
IF version = 4;
-- If [applied] = false, someone else modified
-- Re-read, merge changes, retry
Account Balance Transfer¶
-- Debit source (with balance check)
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 'source'
IF balance >= 100;
-- Only if debit succeeded, credit destination
-- (Application handles coordination)
Session Management¶
-- Create session if user has no active session
INSERT INTO user_sessions (user_id, session_id, created_at)
VALUES (?, uuid(), toTimestamp(now()))
IF NOT EXISTS;
-- Invalidate specific session
DELETE FROM user_sessions
WHERE user_id = ? AND session_id = ?
IF EXISTS;
Distributed Lock¶
-- Acquire lock
INSERT INTO locks (lock_name, owner, acquired_at)
VALUES ('resource_x', 'node_1', toTimestamp(now()))
IF NOT EXISTS;
-- Release lock (verify ownership)
DELETE FROM locks
WHERE lock_name = 'resource_x'
IF owner = 'node_1';
Future: Accord Transactions¶
Cassandra 5.0+ introduces Accord, a new transaction protocol:
| Aspect | Paxos (LWT) | Accord |
|---|---|---|
| Scope | Single partition | Multi-partition |
| Consistency | Linearizable | Serializable |
| Latency | 4 round trips | 2 round trips (optimistic) |
| Throughput | Limited | Higher |
Accord enables true multi-partition transactions without the limitations of batch LWT.