Commit Log¶

The commit log is Cassandra's write-ahead log (WAL), providing durability for all mutations. Every write is appended to the commit log before being acknowledged, ensuring data can be recovered after a crash.

For an overview of how the commit log fits into the write path, see Write Path.

Purpose and Guarantees¶

The commit log serves a single purpose: crash recovery. It is not used for reads—all read operations go through memtables and SSTables.

Guarantee	Description
Durability	Acknowledged writes survive node restart
Ordering	Mutations are replayed in write order
Atomicity	Individual mutations are atomic (all-or-nothing)

Commit Log vs Replication

The commit log provides single-node durability. For cluster-wide durability, Cassandra relies on replication. With RF=3 and QUORUM writes, data survives even if one node loses its commit log before flushing.

Segment Architecture¶

The commit log is organized into segments—fixed-size files that are allocated, filled, and eventually deleted.

Segment States¶

State	Description
Available	Allocated, empty, queued for use
Allocating	Currently receiving mutations
Active	Full (reached `commitlog_segment_size_in_mb`), awaiting memtable flush
Clean	All referenced data flushed to SSTables; segment is deleted

Segment Allocation¶

The CommitLogSegmentManager maintains a pool of available segments to ensure a new segment is always ready when the current one fills. Segments are allocated on demand—they are not pre-allocated to their full size on disk.

Memory-mapped segments (default): The segment file is created and memory-mapped. The file grows as mutations are appended, up to commitlog_segment_size_in_mb.

Compressed/encrypted segments: Mutations are buffered in memory and written in blocks. The on-disk size depends on compression ratio.

Historical: Pre-allocation and Recycling (removed in 2.2)

Prior to Cassandra 2.2, segments were pre-allocated to their full size (128MB default) and recycled after use. This was removed to reduce page cache pressure and simplify the code. Modern Cassandra deletes segments when clean rather than recycling them.

Dirty Interval Tracking¶

Each segment maintains a map of which table mutations it contains and their position ranges:

Segment: CommitLog-7-1702345678901.log

Dirty Intervals:
┌─────────────────┬────────────────────┐
│ Table ID        │ Position Range     │
├─────────────────┼────────────────────┤
│ users           │ [1024, 15360]      │
│ orders          │ [2048, 31744]      │
│ events          │ [8192, 28672]      │
└─────────────────┴────────────────────┘

When a memtable flushes, it reports its commit log position range. The segment marks those intervals as clean. Once all intervals are clean, the segment is deleted.

Segment File Format¶

Each segment file contains a header followed by sync blocks of serialized mutations.

File Structure¶

┌─────────────────────────────────────────────────────────────┐
│                         HEADER                              │
├──────────┬───────────┬─────────────┬────────────┬───────────┤
│ Version  │ Segment   │ Params Len  │ Params     │ Header    │
│ (4 bytes)│ ID (8)    │ (4 bytes)   │ (JSON)     │ CRC (4)   │
└──────────┴───────────┴─────────────┴────────────┴───────────┘

┌─────────────────────────────────────────────────────────────┐
│                      SYNC BLOCK 1                           │
├─────────────────────────────────────────────────────────────┤
│ Sync Marker: [next block offset (4) | marker CRC (4)]       │
├─────────────────────────────────────────────────────────────┤
│ Mutation 1: [size (4) | size CRC (4) | data | data CRC (4)] │
│ Mutation 2: [size (4) | size CRC (4) | data | data CRC (4)] │
│ ...                                                         │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                      SYNC BLOCK 2                           │
│                         ...                                 │
└─────────────────────────────────────────────────────────────┘

Header Fields¶

Field	Size	Description
Version	4 bytes	Commit log format version
Segment ID	8 bytes	Unique segment identifier
Parameters length	4 bytes	Length of JSON parameters
Parameters	Variable	JSON: compression/encryption settings
Header CRC	4 bytes	CRC32 checksum of header

Sync Block Structure¶

Each sync block begins with a marker indicating the offset to the next block:

Field	Size	Description
Next offset	4 bytes	File position of next sync block
Marker CRC	4 bytes	CRC32 of the offset value

Mutation Entry Format¶

Field	Size	Description
Size	4 bytes	Serialized mutation size
Size CRC	4 bytes	CRC32 of size field
Data	Variable	Serialized mutation bytes
Data CRC	4 bytes	CRC32 of mutation data

The dual CRC design (size + data) allows detection of both truncation and corruption during replay.

Sync Modes¶

The sync mode controls when data is flushed from OS page cache to persistent storage.

Periodic (Default)¶

commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

Mutations are written to the page cache and acknowledged immediately. A background thread calls fsync() every N milliseconds.

Trade-off: Lowest latency, but up to commitlog_sync_period_in_ms of data can be lost on power failure (mitigated by replication).

Batch¶

commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 2

Mutations are batched for up to N milliseconds, then flushed together with a single fsync().

Trade-off: Higher latency (waits for batch window), but stronger single-node durability.

Group¶

commitlog_sync: group
commitlog_sync_group_window_in_ms: 1000

Similar to batch, but with a larger default window. Mutations are grouped and synced together.

Mode	Default Window	Latency	Durability
periodic	10,000 ms	Lowest	Weakest (single-node)
group	1,000 ms	Low	Moderate
batch	2 ms	Higher	Strongest

Segment Types¶

The segment type determines how data is written to disk.

Memory-Mapped (Default)¶

# No special configuration - this is the default

Uses memory-mapped I/O. The segment is mapped into virtual memory, and writes go directly to the mapped region. The OS handles flushing based on sync mode.

Characteristics:

Simplest implementation
Relies on OS page cache management
No compression overhead

Compressed¶

commitlog_compression:
  - class_name: LZ4Compressor
    parameters: {}

Mutations are compressed in memory before writing to disk.

Algorithm	Class Name	Ratio	CPU
LZ4	`LZ4Compressor`	~2-3x	Very low
Snappy	`SnappyCompressor`	~2x	Low
Deflate	`DeflateCompressor`	~4-5x	High

Characteristics:

Reduces disk I/O
Smaller commit log footprint
Slight CPU overhead
Compression happens in fixed-size buffers before writing

Encrypted¶

transparent_data_encryption_options:
  enabled: true
  chunk_length_kb: 64
  cipher: AES/CBC/PKCS5Padding
  key_alias: testing:1
  key_provider:
    - class_name: org.apache.cassandra.security.JKSKeyProvider
      parameters:
        - keystore: conf/.keystore
          keystore_password: changeit
          store_type: JCEKS
          key_password: changeit

Data is written in configurable-size blocks. Each block is compressed (if enabled) then encrypted.

Block Structure (Encrypted):

┌───────────────────────────────────────────┐
│ Total Block Length (unencrypted, 4 bytes) │
│ Encrypted Data Length (unencrypted, 4)    │
│ Encrypted Data (variable)                 │
│   └── Contains: compressed mutation data  │
└───────────────────────────────────────────┘

The length fields are unencrypted to allow reading block boundaries without decryption.

Replay Process¶

On startup, Cassandra replays commit log segments to recover mutations that weren't flushed to SSTables.

Replay Algorithm¶

Replay Filtering¶

Not all mutations in a segment need replay. Each SSTable records the commit log position at flush time. During replay:

Read mutation's commit log position
Check if the mutation's table has an SSTable flushed after that position
If yes, skip (data already durable in SSTable)
If no, apply mutation to memtable

Handling Corruption¶

Corruption Type	Detection	Recovery
Header corruption	Header CRC mismatch	Skip entire segment
Sync marker corruption	Marker CRC mismatch	Skip to next segment
Size field corruption	Size CRC mismatch	Skip to next sync block
Data corruption	Data CRC mismatch	Skip mutation, continue
Truncation	Unexpected EOF	Stop replay at truncation point

The CRC-based design allows partial recovery—corruption in one mutation doesn't prevent replaying subsequent valid mutations.

Configuration Reference¶

Core Settings¶

Parameter	Default	Description
`commitlog_directory`	`$CASSANDRA_HOME/data/commitlog`	Commit log location
`commitlog_segment_size_in_mb`	32	Max segment size before switching
`commitlog_total_space_in_mb`	8192	Max total commit log space

Sync Settings¶

Parameter	Default	Description
`commitlog_sync`	`periodic`	Sync mode: periodic, batch, or group
`commitlog_sync_period_in_ms`	10000	Periodic sync interval
`commitlog_sync_batch_window_in_ms`	2	Batch mode window
`commitlog_sync_group_window_in_ms`	1000	Group mode window

Compression Settings¶

Parameter	Default	Description
`commitlog_compression`	none	Compression algorithm configuration

Encryption Settings¶

Parameter	Default	Description
`transparent_data_encryption_options.enabled`	false	Enable encryption
`transparent_data_encryption_options.chunk_length_kb`	64	Encryption block size
`transparent_data_encryption_options.cipher`	AES/CBC/PKCS5Padding	Cipher algorithm

Operational Considerations¶

Storage Recommendations¶

Recommendation	Rationale
Dedicated disk/volume	Isolate commit log I/O from data I/O
Fast storage (NVMe/SSD)	Commit log is write-intensive
Battery-backed cache	Allows safer periodic sync with durability
Separate from `data_file_directories`	Prevents commit log from competing with compaction

Monitoring¶

# Check commit log size
du -sh /var/lib/cassandra/commitlog/

# Count segments
ls -1 /var/lib/cassandra/commitlog/*.log | wc -l

# Monitor via JMX
nodetool sjk mx -b "org.apache.cassandra.metrics:type=CommitLog,name=TotalCommitLogSize" -f Value
nodetool sjk mx -b "org.apache.cassandra.metrics:type=CommitLog,name=PendingTasks" -f Value

Metric	Description	Alert Threshold
`TotalCommitLogSize`	Current size of all segments	> 75% of `commitlog_total_space_in_mb`
`PendingTasks`	Mutations awaiting sync	Sustained high values
`WaitingOnCommit`	Time waiting for fsync	> 100ms average
`WaitingOnSegmentAllocation`	Time waiting for new segment	Should be ~0

Troubleshooting¶

Symptom	Possible Cause	Resolution
High `WaitingOnCommit`	Slow disk, high load	Faster storage, tune sync mode
Growing commit log	Memtables not flushing	Check `memtable_flush_writers`, disk space
Slow startup	Large commit log replay	More frequent flushing, check flush triggers
Segment allocation delays	Disk full or slow	Free space, faster storage

Version History¶

Version	Cassandra	Changes
6	3.0 - 3.11	Introduced with storage engine rewrite
7	4.0+	Current format, improved checksums

See Segment Allocation for details on segment recycling removal in 2.2.

Write Path - How writes flow through the storage engine
Change Data Capture (CDC) - Exposing commit log for external consumption
Memtables - In-memory write buffer
Backup and Restore - Commit log in backup strategies