Skip to content

Compaction

Compaction is the process of merging SSTables to reduce read amplification, reclaim space from tombstones, and maintain manageable file counts. Selecting an appropriate strategy and configuration is critical for cluster performance.

Cassandra 5.0+ Recommendation

For new deployments on Cassandra 5.0+, Unified Compaction Strategy (UCS) is the recommended default. UCS provides a single, configurable strategy that can emulate STCS, LCS, or hybrid behaviors through its scaling_parameters option. Existing clusters can continue using their current strategies or migrate to UCS with a simple ALTER TABLE.


SSTable Accumulation

As described in the Write Path, Cassandra writes first go to the commit log and memtable. When a memtable reaches its threshold, it flushes to disk as an immutable SSTable. This append-only design enables fast writes but creates a side effect: SSTables accumulate continuously.

uml diagram

Each flush creates a new SSTable containing:

  • Data from the memtable at flush time
  • Potentially overlapping partition keys with existing SSTables
  • Updated values for previously written rows
  • Tombstones for deleted data

Without intervention, a table receiving continuous writes accumulates hundreds or thousands of SSTables. This creates problems for both reads and disk management.


Why Compaction is Necessary

Without compaction, SSTable accumulation degrades read performance:

uml diagram

Each read must check bloom filters across all SSTables. Even with bloom filter optimization, false positives accumulate—potentially requiring disk seeks to dozens of SSTables for a single partition read.


The Compaction Process

uml diagram

Benefits

Benefit Description
Reduces read amplification Fewer SSTables to check per read
Reclaims space Removes tombstones after gc_grace_seconds
Removes obsolete data Discards old versions of updated cells
Improves compression Larger, consolidated data compresses better
Updates statistics Refreshes min/max values, partition sizes

Amplification Factors

Every compaction strategy involves trade-offs between three types of amplification.

Write Amplification

How many times data is written to disk over its lifetime.

\[\text{Write Amplification} = \frac{\text{Total Bytes Written to Disk}}{\text{Bytes Received from Client}}\]

Example:

  • Client writes 1GB of data
  • Data is written once to commit log
  • Data is written once to SSTable (memtable flush)
  • Data is rewritten 3× during compaction
  • Write amplification = \(\frac{1 + 1 + 3}{1} = 5\times\)

Impact:

  • Higher write amplification increases disk I/O
  • SSD lifetime is measured in total bytes written
  • Write amplification of 10× means SSD wears 10× faster

Read Amplification

How many SSTables must be checked per read.

\[\text{Read Amplification} = \text{Number of SSTables Touched per Read}\]
Case Read Amplification
Ideal (fully consolidated) \(1\)
Worst (no compaction) \(N\) (one per flush)

Impact:

  • Higher read amplification increases disk seeks and latency
  • With HDDs: Each SSTable check ≈ 10ms seek time
  • With SSDs: Each SSTable check ≈ 0.1ms
  • Significantly affects P99 latency

Space Amplification

How much extra disk space is needed beyond raw data size.

\[\text{Space Amplification} = \frac{\text{Disk Space Used}}{\text{Actual Data Size}}\]

Example:

  • Raw data: 100GB
  • Tombstones pending cleanup: 10GB
  • Old SSTable versions during compaction: 100GB (temporary)
  • Space amplification = \(\frac{210\text{GB}}{100\text{GB}} = 2.1\times\)

Impact:

  • Determines required disk headroom
  • Some strategies need 2× space temporarily during compaction
  • Running out of disk during compaction causes failures

Strategy Comparison

Strategy Write Amp Read Amp Space Amp Best For
STCS Low High Medium Write-heavy workloads
LCS High Low Low Read-heavy workloads
TWCS Low Low Low Time-series with TTL
UCS Varies Varies Low Adaptive (Cassandra 5.0+)

Strategy Selection

uml diagram

Strategy by Workload Pattern

Workload Cassandra 5.0+ Pre-5.0 Rationale
Write-heavy (>90% writes) UCS (T8) STCS Low write amplification
Read-heavy (>70% reads) UCS (L10) LCS Low read amplification
Time-series with TTL TWCS or UCS TWCS Efficient TTL expiration
Mixed workload UCS (T4) STCS Balanced trade-offs
Frequently updated data UCS (L4) LCS Consolidates versions quickly
Append-only logs UCS (T4) or TWCS STCS or TWCS Minimal rewrites

Monitoring

Key Metrics

Metric Warning Critical Action
Pending compactions >50 >200 Increase throughput
SSTable count (STCS) >20 >50 Check compaction progress
L0 SSTable count (LCS) >8 >32 Throttle writes or switch strategy
Compaction throughput <50% configured <25% configured Check disk I/O
Disk space free <30% <20% Add storage or run compaction

Commands

# Current compaction activity
nodetool compactionstats

# Per-table SSTable count and sizes
nodetool tablestats keyspace.table

# Compaction history
nodetool compactionhistory

# SSTable count per level (LCS)
nodetool tablestats keyspace.table | grep "SSTables in each level"

JMX Metrics

# Pending compactions
org.apache.cassandra.metrics:type=Compaction,name=PendingTasks

# Compaction throughput (bytes/second)
org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted

# Per-table metrics
org.apache.cassandra.metrics:type=Table,keyspace=*,scope=*,name=LiveSSTableCount
org.apache.cassandra.metrics:type=Table,keyspace=*,scope=*,name=PendingCompactions

Best Practices

General

  1. Never disable auto-compaction unless there is a specific operational reason
  2. Avoid major compaction during normal production operations
  3. Monitor pending tasks - sustained growth indicates a problem
  4. Maintain 30%+ free disk space for compaction headroom
  5. Run repair regularly to prevent zombie data after tombstone removal

Configuration

# cassandra.yaml

# Maximum compaction throughput per node (MB/s)
# Higher = faster compaction, more disk I/O competition
# 0 = unlimited
compaction_throughput_mb_per_sec: 64

# Number of concurrent compaction threads
# Default: min(4, number_of_disks)
concurrent_compactors: 4
# Adjust at runtime
nodetool setcompactionthroughput 128
nodetool setconcurrentcompactors 4