BATCH¶

The BATCH statement groups multiple INSERT, UPDATE, and DELETE statements into a single atomic operation. Batches ensure that all statements either complete or none do, providing atomicity guarantees. However, batches are frequently misused as a performance optimization—they are not.

Behavioral Guarantees¶

What BATCH Guarantees¶

All statements in a logged batch eventually execute or none execute
Statements to the same partition are atomic at the storage layer
Logged batches write to batch log before executing mutations
USING TIMESTAMP applies the same timestamp to all statements
If any IF condition fails, no statements in the batch execute

What BATCH Does NOT Guarantee¶

Undefined Behavior

The following behaviors are undefined and must not be relied upon:

Isolation: Other reads may see partial batch results before completion
Performance improvement: Batches do not improve throughput over parallel writes
Cross-datacenter atomicity: Logged batches provide atomicity within a datacenter, not globally
Unlogged batch atomicity: UNLOGGED batches have no atomicity guarantee on coordinator failure
Order of execution: Statements within a batch may execute in any order

Logged vs Unlogged Contracts¶

Aspect	Logged Batch	Unlogged Batch
Coordinator failure	Recovers via batch log	Undefined - partial execution possible
Multi-partition	Atomic (via batch log)	Not atomic
Same-partition	Atomic	Atomic (storage layer)
Performance	Higher latency (batch log overhead)	Lower latency

Failure Semantics¶

Failure Mode	Logged Batch	Unlogged Batch
Coordinator fails after batch log write	Batch log replays mutations	Partial execution possible
Coordinator fails before batch log write	Not applied	Not applied
`WriteTimeoutException`	May have been logged for replay	Undefined
`UnavailableException`	Not applied	Not applied

Version-Specific Behavior¶

Version	Behavior
1.2+	Basic BATCH support
2.0+	UNLOGGED BATCH, improved batch log (CASSANDRA-4542)
2.1+	COUNTER BATCH separated from regular batches
3.0+	Improved batch log performance (CASSANDRA-9673)
4.0+	Enhanced multi-partition batch handling

Overview¶

What BATCH Is For¶

BATCH provides atomicity, not performance:

Use Case	Example	Result
Correct: Atomicity	`INSERT INTO users ...` `INSERT INTO users_by_email ...`	Both succeed or both fail ✓
Wrong: 'Performance'	`INSERT INTO table_a ...` `INSERT INTO table_b ...` `INSERT INTO table_c ...`	Slower than individual inserts! ✗

Historical Context¶

BATCH was introduced in CQL 3.0 (Cassandra 1.2) to address the problem of maintaining consistency across denormalized tables:

Version	Feature
1.2	Basic BATCH with atomicity
2.0	UNLOGGED BATCH, batch log improvements
2.1	COUNTER BATCH separated
3.0	Improved batch log efficiency
4.0	Better multi-partition batch handling

Synopsis¶

BEGIN [ UNLOGGED | COUNTER ] BATCH
    [ USING TIMESTAMP *microseconds* ]
    *dml_statement* ;
    [ *dml_statement* ; ... ]
APPLY BATCH

dml_statement:

INSERT ... | UPDATE ... | DELETE ...

Batch Types¶

Logged Batch (Default)¶

BEGIN BATCH
    INSERT INTO users (user_id, username) VALUES (?, 'alice');
    INSERT INTO usernames (username, user_id) VALUES ('alice', ?);
APPLY BATCH;

Execution:

Batch Log Architecture¶

The batch log ensures durability by writing to multiple nodes before executing mutations:

Coordinator selects batch log endpoints: Two nodes in the local datacenter (chosen to minimize latency)
Batch log written: The serialized batch is written to each endpoint's local batch log table
Mutations executed: After batch log is durable, mutations are sent to replicas
Batch log removed: After all mutations acknowledged, batch log entries are deleted

Batch Log Storage

The batch log uses LocalStrategy (RF=1), meaning each node stores only its own batch log entries. Durability comes from writing to multiple endpoints, not from replication.

Batch Log Table Versions¶

The batch log implementation has evolved across Cassandra versions:

Version	Table	Key Changes
< 2.2	`system.batchlog`	Original implementation
2.2+	`system.batches`	New table format, improved performance
3.0+	`system.batches`	Batch log replay improvements
4.0+	`system.batches`	Enhanced timeout handling

-- View pending batches (should normally be empty)
SELECT * FROM system.batches;

-- Columns vary by version, but typically include:
-- id (timeuuid), version (int), written_at (timestamp), data (blob)

Batch Log Replay¶

Batch log entries are replayed automatically:

Each node periodically scans its local batch log
Entries older than batchlog_replay_throttle are candidates for replay
Mutations are re-executed to ensure completion
Successfully replayed entries are removed

# cassandra.yaml
batchlog_replay_throttle_in_kb: 1024  # Throttle replay bandwidth

Guarantees:

All statements eventually execute or none do
Coordinator failure doesn't lose the batch (batch log on other nodes)
Batch log written to 2 endpoints for redundancy

Cost:

Additional writes to 2 batch log endpoints
Higher latency than unlogged (must wait for batch log durability)
Increased coordinator memory usage

Unlogged Batch¶

BEGIN UNLOGGED BATCH
    UPDATE user_profile SET name = 'Alice' WHERE user_id = ?;
    UPDATE user_profile SET email = '[email protected]' WHERE user_id = ?;
    UPDATE user_profile SET updated_at = toTimestamp(now()) WHERE user_id = ?;
APPLY BATCH;

Execution:

No batch log write
Statements sent directly to replicas
No recovery if coordinator fails

Use when:

All statements target the same partition
Atomicity across coordinator failure not required
Lower latency needed

Counter Batch¶

BEGIN COUNTER BATCH
    UPDATE page_stats SET views = views + 1 WHERE page_id = 'home';
    UPDATE page_stats SET views = views + 1 WHERE page_id = 'about';
    UPDATE daily_stats SET requests = requests + 2 WHERE date = '2024-01-15';
APPLY BATCH;

Restrictions:

Can only contain counter updates
Cannot mix counter and non-counter statements
No TTL allowed

Parameters¶

USING TIMESTAMP¶

Applies a single timestamp to all statements:

BEGIN BATCH USING TIMESTAMP 1705315800000000
    INSERT INTO table1 (id, data) VALUES (1, 'a');
    INSERT INTO table2 (id, data) VALUES (2, 'b');
APPLY BATCH;

Behavior:

All mutations share the same timestamp
Individual statements cannot override
Cannot be used with LWT (IF conditions)

Same-Partition vs Multi-Partition Batches¶

Same-Partition Batch (Recommended)¶

All statements target the same partition:

-- Good: Single partition batch
BEGIN UNLOGGED BATCH
    INSERT INTO user_events (user_id, event_id, type) VALUES (123, uuid(), 'login');
    INSERT INTO user_events (user_id, event_id, type) VALUES (123, uuid(), 'page_view');
    INSERT INTO user_events (user_id, event_id, type) VALUES (123, uuid(), 'click');
APPLY BATCH;

Benefits:

Single coordinator to single replica set
Atomic at storage layer (same SSTable)
Minimal coordination overhead

Multi-Partition Batch (Use Carefully)¶

Statements target different partitions:

-- Acceptable: Denormalized tables that must stay consistent
BEGIN BATCH
    INSERT INTO users (user_id, email) VALUES (?, '[email protected]');
    INSERT INTO users_by_email (email, user_id) VALUES ('[email protected]', ?);
APPLY BATCH;

Costs:

Coordinator must contact multiple nodes
Batch log adds latency
Memory pressure on coordinator

Batch Anti-Patterns¶

Anti-Pattern 1: Batching for Performance¶

-- WRONG: Batching unrelated writes doesn't improve performance
BEGIN BATCH
    INSERT INTO users (user_id, name) VALUES (1, 'Alice');
    INSERT INTO users (user_id, name) VALUES (2, 'Bob');
    INSERT INTO users (user_id, name) VALUES (3, 'Charlie');
    -- ... 100 more unrelated inserts
APPLY BATCH;

Why it's slow:

Coordinator must track all mutations in memory
Single point of coordination
Larger network payload than parallel requests

Better approach:

// Parallel async inserts
List<CompletionStage<AsyncResultSet>> futures = new ArrayList<>();
for (User user : users) {
    futures.add(session.executeAsync(insertStmt.bind(user)));
}
CompletableFuture.allOf(futures.toArray()).join();

Anti-Pattern 2: Large Batches¶

-- WRONG: Batch too large
BEGIN BATCH
    -- 1000 INSERT statements
APPLY BATCH;

Problems:

Exceeds batch size thresholds
Coordinator memory exhaustion
Timeout likelihood increases

Thresholds (cassandra.yaml):

batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50

Anti-Pattern 3: Batching Different Tables Without Need¶

-- WRONG: No atomicity requirement
BEGIN BATCH
    INSERT INTO audit_log (id, action) VALUES (uuid(), 'user_created');
    INSERT INTO metrics (id, count) VALUES ('users', 1);
    INSERT INTO notifications (id, message) VALUES (uuid(), 'Welcome!');
APPLY BATCH;

Better approach:

Execute independently—if one fails, others can still succeed.

Conditional Batches (LWT)¶

Batches can include lightweight transaction conditions:

BEGIN BATCH
    INSERT INTO users (user_id, username) VALUES (?, 'alice') IF NOT EXISTS;
    INSERT INTO usernames (username, user_id) VALUES ('alice', ?) IF NOT EXISTS;
APPLY BATCH;

LWT Batch Behavior¶

Important: When ANY statement has an IF condition, ALL statements use Paxos:

Key points:

Entire batch is all-or-nothing
One failed condition aborts all statements
Significant performance impact

Restrictions¶

Cannot mix conditional and unconditional statements
All conditions must be on same partition
Cannot use USING TIMESTAMP with IF

Batch Size Limits¶

Configuration¶

# cassandra.yaml
batch_size_warn_threshold_in_kb: 5     # Log warning
batch_size_fail_threshold_in_kb: 50    # Reject batch

Monitoring¶

# Check for batch size warnings
grep "Batch" /var/log/cassandra/system.log

# Metrics
nodetool tablestats system.batches

Calculating Batch Size¶

Approximate formula:

batch_size ≈ sum(mutation_sizes) + overhead
mutation_size ≈ key_size + sum(column_sizes) + metadata

Best Practices¶

Do Use Batches For¶

Good Batch Use Cases

Denormalized table consistency: Keep related tables in sync
Same-partition atomicity: Multiple writes to one partition
Conditional group operations: LWT across related rows

Don't Use Batches For¶

Batch Anti-Patterns

Performance optimization: Use async parallel writes instead
Large bulk loads: Use SSTable loader or async writes
Unrelated writes: No atomicity needed
Cross-datacenter atomicity: Batches don't provide this

Sizing Guidelines¶

Batch Type	Recommended Size
Same-partition	Up to 100 statements
Multi-partition	2-5 partitions
With LWT	2-3 statements

Monitoring¶

-- Check batch log table
SELECT * FROM system.batches;

-- Should typically be empty (batches complete quickly)

Restrictions¶

Restrictions

Statement Types:

Only INSERT, UPDATE, DELETE allowed
Cannot include SELECT statements
Cannot include DDL statements

Counter Batches:

Must use COUNTER keyword
Cannot mix counter and non-counter statements in same batch
No TTL allowed on counter updates
No IF conditions allowed with counters
Counter batches are always unlogged internally

Conditional Batches:

All IF conditions must be on same partition
Cannot use USING TIMESTAMP
Cannot mix conditional and unconditional statements

General:

Maximum batch size enforced by configuration
Batch log adds overhead to logged batches
Multi-partition batches not atomic if using UNLOGGED

Atomicity Scope¶

Batch Atomicity is Partition-Scoped

A critical misconception: batches are only atomic within a single partition at the storage layer.

Batch Type	Single Partition	Multi-Partition
LOGGED	Atomic	Eventual atomicity via batch log
UNLOGGED	Atomic	Not atomic - partial execution possible
COUNTER	Atomic	Not atomic

Multi-partition "atomicity" via logged batches:

Batch log ensures eventual delivery, not instant atomicity
Other reads may see partial results during batch execution
If batch log replay fails repeatedly, mutations may be lost

-- This is NOT instantly atomic across partitions
BEGIN BATCH
    INSERT INTO users (user_id, ...) VALUES ('user1', ...);    -- Partition 1
    INSERT INTO users (user_id, ...) VALUES ('user2', ...);    -- Partition 2
APPLY BATCH;
-- A concurrent read might see user1 but not user2

Large Batch Performance Degradation¶

Large Batches Cause Performance Problems

Batches are NOT a performance optimization. Large batches cause severe issues:

Issue	Impact
Coordinator memory	Entire batch held in memory until complete
GC pressure	Large batches trigger garbage collection
Batch log pressure	Logged batches write to batch log first
Timeout risk	Large batches more likely to timeout
Replay storms	Failed large batches cause replay overhead

Configuration limits:

# cassandra.yaml
batch_size_warn_threshold_in_kb: 5     # Warn above 5KB
batch_size_fail_threshold_in_kb: 50    # Fail above 50KB

Warning signs in logs:

WARN  Batch for [table] is of size 52KB, exceeding specified threshold of 5KB

Best practices:

Keep batches small (< 5KB, ideally < 20 statements)
Use batches for atomicity, not throughput
For bulk loading, use parallel individual writes or SSTable loader
Monitor BatchMetrics in JMX for batch sizes

Examples¶

Denormalized Table Insert¶

-- Maintain consistency between users and users_by_email
BEGIN BATCH
    INSERT INTO users (user_id, email, username, created_at)
    VALUES (?, '[email protected]', 'alice', toTimestamp(now()));

    INSERT INTO users_by_email (email, user_id, username)
    VALUES ('[email protected]', ?, 'alice');
APPLY BATCH;

Same-Partition Event Logging¶

BEGIN UNLOGGED BATCH
    INSERT INTO user_events (user_id, event_time, event_type, data)
    VALUES (123, toTimestamp(now()), 'login', '{"ip": "192.168.1.1"}');

    UPDATE user_stats SET login_count = login_count + 1
    WHERE user_id = 123;
APPLY BATCH;

Conditional User Registration¶

BEGIN BATCH
    INSERT INTO users (user_id, username, email)
    VALUES (?, 'desired_username', '[email protected]')
    IF NOT EXISTS;

    INSERT INTO usernames (username, user_id)
    VALUES ('desired_username', ?)
    IF NOT EXISTS;
APPLY BATCH;

Counter Aggregation¶

BEGIN COUNTER BATCH
    UPDATE daily_metrics SET page_views = page_views + 1
    WHERE date = '2024-01-15' AND page = 'home';

    UPDATE daily_metrics SET page_views = page_views + 1
    WHERE date = '2024-01-15' AND page = 'about';

    UPDATE total_metrics SET total_views = total_views + 2
    WHERE metric_id = 'all_time';
APPLY BATCH;

Batch with Timestamp¶

BEGIN BATCH USING TIMESTAMP 1705315800000000
    INSERT INTO events (event_id, type) VALUES (uuid(), 'imported');
    INSERT INTO audit_log (log_id, action) VALUES (uuid(), 'data_import');
APPLY BATCH;

INSERT - Single row inserts
UPDATE - Single row updates
DELETE - Single row deletes
Lightweight Transactions - Conditional batch details

BATCH¶

Behavioral Guarantees¶

What BATCH Guarantees¶

What BATCH Does NOT Guarantee¶

Logged vs Unlogged Contracts¶

Failure Semantics¶

Version-Specific Behavior¶

Overview¶

What BATCH Is For¶

Historical Context¶

Synopsis¶

Batch Types¶

Logged Batch (Default)¶

Batch Log Architecture¶

Batch Log Table Versions¶

Batch Log Replay¶

Unlogged Batch¶

Counter Batch¶

Parameters¶

USING TIMESTAMP¶

Same-Partition vs Multi-Partition Batches¶

Same-Partition Batch (Recommended)¶

Multi-Partition Batch (Use Carefully)¶

Batch Anti-Patterns¶

Anti-Pattern 1: Batching for Performance¶

Anti-Pattern 2: Large Batches¶

Anti-Pattern 3: Batching Different Tables Without Need¶

Conditional Batches (LWT)¶

LWT Batch Behavior¶

Restrictions¶

Batch Size Limits¶

Configuration¶

Monitoring¶

Calculating Batch Size¶

Best Practices¶

Do Use Batches For¶

Don't Use Batches For¶

Sizing Guidelines¶

Monitoring¶

Restrictions¶

Atomicity Scope¶

Large Batch Performance Degradation¶

Examples¶

Denormalized Table Insert¶

Same-Partition Event Logging¶

Conditional User Registration¶

Counter Aggregation¶

Batch with Timestamp¶

Related Documentation¶