Compaction¶
Compaction is the process of merging SSTables to reduce read amplification, reclaim space from tombstones, and maintain manageable file counts. Selecting an appropriate strategy and configuration is critical for cluster performance.
Cassandra 5.0+ Recommendation
For new deployments on Cassandra 5.0+, Unified Compaction Strategy (UCS) is the recommended default. UCS provides a single, configurable strategy that can emulate STCS, LCS, or hybrid behaviors through its scaling_parameters option. Existing clusters can continue using their current strategies or migrate to UCS with a simple ALTER TABLE.
SSTable Accumulation¶
As described in the Write Path, Cassandra writes first go to the commit log and memtable. When a memtable reaches its threshold, it flushes to disk as an immutable SSTable. This append-only design enables fast writes but creates a side effect: SSTables accumulate continuously.
Each flush creates a new SSTable containing:
- Data from the memtable at flush time
- Potentially overlapping partition keys with existing SSTables
- Updated values for previously written rows
- Tombstones for deleted data
Without intervention, a table receiving continuous writes accumulates hundreds or thousands of SSTables. This creates problems for both reads and disk management.
Why Compaction is Necessary¶
Without compaction, SSTable accumulation degrades read performance:
Each read must check bloom filters across all SSTables. Even with bloom filter optimization, false positives accumulate—potentially requiring disk seeks to dozens of SSTables for a single partition read.
The Compaction Process¶
Benefits¶
| Benefit | Description |
|---|---|
| Reduces read amplification | Fewer SSTables to check per read |
| Reclaims space | Removes tombstones after gc_grace_seconds |
| Removes obsolete data | Discards old versions of updated cells |
| Improves compression | Larger, consolidated data compresses better |
| Updates statistics | Refreshes min/max values, partition sizes |
Amplification Factors¶
Every compaction strategy involves trade-offs between three types of amplification.
Write Amplification¶
How many times data is written to disk over its lifetime.
Example:
- Client writes 1GB of data
- Data is written once to commit log
- Data is written once to SSTable (memtable flush)
- Data is rewritten 3× during compaction
- Write amplification = \(\frac{1 + 1 + 3}{1} = 5\times\)
Impact:
- Higher write amplification increases disk I/O
- SSD lifetime is measured in total bytes written
- Write amplification of 10× means SSD wears 10× faster
Read Amplification¶
How many SSTables must be checked per read.
| Case | Read Amplification |
|---|---|
| Ideal (fully consolidated) | \(1\) |
| Worst (no compaction) | \(N\) (one per flush) |
Impact:
- Higher read amplification increases disk seeks and latency
- With HDDs: Each SSTable check ≈ 10ms seek time
- With SSDs: Each SSTable check ≈ 0.1ms
- Significantly affects P99 latency
Space Amplification¶
How much extra disk space is needed beyond raw data size.
Example:
- Raw data: 100GB
- Tombstones pending cleanup: 10GB
- Old SSTable versions during compaction: 100GB (temporary)
- Space amplification = \(\frac{210\text{GB}}{100\text{GB}} = 2.1\times\)
Impact:
- Determines required disk headroom
- Some strategies need 2× space temporarily during compaction
- Running out of disk during compaction causes failures
Strategy Comparison¶
| Strategy | Write Amp | Read Amp | Space Amp | Best For |
|---|---|---|---|---|
| STCS | Low | High | Medium | Write-heavy workloads |
| LCS | High | Low | Low | Read-heavy workloads |
| TWCS | Low | Low | Low | Time-series with TTL |
| UCS | Varies | Varies | Low | Adaptive (Cassandra 5.0+) |
Strategy Selection¶
Strategy by Workload Pattern¶
| Workload | Cassandra 5.0+ | Pre-5.0 | Rationale |
|---|---|---|---|
| Write-heavy (>90% writes) | UCS (T8) | STCS | Low write amplification |
| Read-heavy (>70% reads) | UCS (L10) | LCS | Low read amplification |
| Time-series with TTL | TWCS or UCS | TWCS | Efficient TTL expiration |
| Mixed workload | UCS (T4) | STCS | Balanced trade-offs |
| Frequently updated data | UCS (L4) | LCS | Consolidates versions quickly |
| Append-only logs | UCS (T4) or TWCS | STCS or TWCS | Minimal rewrites |
Monitoring¶
Key Metrics¶
| Metric | Warning | Critical | Action |
|---|---|---|---|
| Pending compactions | >50 | >200 | Increase throughput |
| SSTable count (STCS) | >20 | >50 | Check compaction progress |
| L0 SSTable count (LCS) | >8 | >32 | Throttle writes or switch strategy |
| Compaction throughput | <50% configured | <25% configured | Check disk I/O |
| Disk space free | <30% | <20% | Add storage or run compaction |
Commands¶
# Current compaction activity
nodetool compactionstats
# Per-table SSTable count and sizes
nodetool tablestats keyspace.table
# Compaction history
nodetool compactionhistory
# SSTable count per level (LCS)
nodetool tablestats keyspace.table | grep "SSTables in each level"
JMX Metrics¶
# Pending compactions
org.apache.cassandra.metrics:type=Compaction,name=PendingTasks
# Compaction throughput (bytes/second)
org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted
# Per-table metrics
org.apache.cassandra.metrics:type=Table,keyspace=*,scope=*,name=LiveSSTableCount
org.apache.cassandra.metrics:type=Table,keyspace=*,scope=*,name=PendingCompactions
Best Practices¶
General¶
- Never disable auto-compaction unless there is a specific operational reason
- Avoid major compaction during normal production operations
- Monitor pending tasks - sustained growth indicates a problem
- Maintain 30%+ free disk space for compaction headroom
- Run repair regularly to prevent zombie data after tombstone removal
Configuration¶
# cassandra.yaml
# Maximum compaction throughput per node (MB/s)
# Higher = faster compaction, more disk I/O competition
# 0 = unlimited
compaction_throughput_mb_per_sec: 64
# Number of concurrent compaction threads
# Default: min(4, number_of_disks)
concurrent_compactors: 4
# Adjust at runtime
nodetool setcompactionthroughput 128
nodetool setconcurrentcompactors 4
Related Documentation¶
- Size-Tiered Compaction (STCS) - Default strategy for write-heavy workloads
- Leveled Compaction (LCS) - Optimized for read-heavy workloads
- Time-Window Compaction (TWCS) - Designed for time-series data
- Unified Compaction (UCS) - Adaptive strategy in Cassandra 5.0+
- Compaction Management - Tuning, troubleshooting, and maintenance
- Tombstones - How compaction removes deleted data
- SSTable Reference - SSTable file format