Unified Compaction Strategy (UCS)¶

UCS (Cassandra 5.0+) is an adaptive compaction strategy that combines concepts from STCS, LCS, and TWCS. It uses sharding and density-based triggering to provide flexible, efficient compaction across varying workloads.

Background and History¶

Origins¶

Unified Compaction Strategy was introduced in Cassandra 5.0 (2023) as a culmination of years of research into compaction efficiency. It was developed primarily by DataStax engineers, building on academic work around "write-optimized" and "read-optimized" LSM-tree variants and practical experience with STCS, LCS, and TWCS limitations.

The key insight driving UCS was that traditional compaction strategies forced users to choose between write amplification (LCS) and read amplification (STCS), with no middle ground. UCS provides a unified framework where this trade-off is configurable via a single parameter.

Design Motivation¶

Each traditional strategy has fundamental limitations:

STCS limitations: - No upper bound on read amplification - Large SSTables accumulate without compacting - Space amplification during major compactions

LCS limitations: - High write amplification (~10× per level) - L0 backlog under write pressure - Cannot keep pace with write-heavy workloads

TWCS limitations: - Only suitable for append-only time-series - Breaks with out-of-order writes or updates - Tombstone handling issues

UCS addresses these by:

Unifying the compaction model: Single configurable strategy replaces three separate implementations
Sharding: Token range divided into independent units for parallel compaction
Density-based triggering: Compaction decisions based on data density, not just counts
Configurable read/write trade-off: Single parameter (scaling_parameters) adjusts behavior

Academic Foundation¶

UCS draws from research on LSM-tree optimization:

Dostoevsky (2018): Demonstrated that tiered and leveled compaction exist on a continuum
Monkey (2017): Showed how to tune bloom filters and compaction together
Lazy Leveling: Hybrid approach combining tiered and leveled properties

The scaling_parameters in UCS directly implements these theoretical insights, allowing operators to position their compaction strategy anywhere on the tiered-to-leveled spectrum.

How UCS Works in Theory¶

Core Concepts¶

UCS introduces several concepts that differ from traditional strategies:

Shards: The token range is divided into fixed segments that compact independently. This enables parallelism and bounds the scope of each compaction.

Runs: A sequence of SSTables that collectively represent one "generation" of data. Runs replace the concept of levels (LCS) or tiers (STCS).

Density: The amount of data per token range unit. UCS triggers compaction when density reaches thresholds, not when SSTable counts reach thresholds.

Scaling Parameter: A single value (e.g., T4, L10) that determines whether UCS behaves more like STCS (tiered) or LCS (leveled).

Shard Structure¶

Each shard maintains its own set of SSTables and compacts without coordination with other shards. This provides:

Parallelism: Multiple CPU cores compact different shards simultaneously
Bounded scope: Each compaction touches only one shard's data
Independent progress: Slow shards don't block fast shards

Tiered Mode (T)¶

When scaling_parameters starts with T (e.g., T4), UCS behaves similarly to STCS:

The number after T specifies the fanout—how many SSTables accumulate before compaction. Higher values (T8, T16) reduce write amplification but increase read amplification.

Leveled Mode (L)¶

When scaling_parameters starts with L (e.g., L10), UCS behaves similarly to LCS:

The number after L specifies the size ratio between levels. L10 means each level is 10× larger than the previous, matching classic LCS behavior.

Density-Based Triggering¶

Unlike traditional strategies that trigger on SSTable count, UCS uses density:

\[d = \frac{s}{v}\]

Where:

\(d\) = density
\(s\) = SSTable size
\(v\) = fraction of token space covered by the SSTable

Traditional (count-based): Trigger when \(\text{SSTable\_count} \geq \text{min\_threshold}\)

Problem: Doesn't account for SSTable sizes or overlap

UCS (density-based): Trigger when density exceeds threshold for shard

Advantage: Considers actual data distribution

Example:

Shard covers 25% of token range (\(v = 0.25\))
Shard contains 10GB of data (\(s = 10\text{GB}\))
Density \(d = \frac{10\text{GB}}{0.25} = 40\text{GB}\) effective
If target density is 32GB, compaction triggers

This approach prevents the "large SSTable accumulation" problem of STCS while avoiding unnecessary compaction of sparse data.

Level Assignment¶

SSTables are assigned to levels based on their size relative to the memtable flush size:

\[L = \begin{cases} \left\lfloor \log_f \frac{s}{m} \right\rfloor & \text{if } s \geq m \\ 0 & \text{otherwise} \end{cases}\]

Where:

\(L\) = level number
\(f\) = fanout (derived from scaling parameter)
\(s\) = SSTable size (or density in sharded mode)
\(m\) = memtable flush size (observed or overridden via flush_size_override)

This creates exponentially-growing size ranges per level. Level 0 contains SSTables up to size \(m\), level 1 contains up to \(m \times f\), level 2 up to \(m \times f^2\), etc.

Example with \(f=4\) and \(m=100\text{MB}\):

SSTable Size	Calculation	Level
50 MB	\(s < m\)	0
100 MB	\(\lfloor \log_4(1) \rfloor\)	0
400 MB	\(\lfloor \log_4(4) \rfloor\)	1
1.6 GB	\(\lfloor \log_4(16) \rfloor\)	2
6.4 GB	\(\lfloor \log_4(64) \rfloor\)	3

Total levels for a dataset:

\[\text{Number of levels} = \begin{cases} \left\lfloor \log_f \frac{D}{m} \right\rfloor & \text{if } D \geq m \\ 0 & \text{otherwise} \end{cases}\]

Where \(D\) = total dataset density.

Benefits¶

Configurable Trade-offs¶

The primary advantage of UCS is tunable behavior:

T4: STCS-like, optimize for writes
L10: LCS-like, optimize for reads
T8, L4, etc.: Intermediate points on the spectrum
Change with ALTER TABLE—no data rewrite needed

Parallel Compaction¶

Sharding enables efficient use of modern hardware:

Multiple compactions run simultaneously
Different shards progress independently
Scales with CPU core count
Reduces wall-clock time for compaction

Bounded Read Amplification¶

Even in tiered mode, UCS provides better bounds than STCS:

Sharding limits SSTable count per token range
Density-based triggering prevents unbounded accumulation
More predictable read performance

Unified Codebase¶

Single strategy implementation simplifies Cassandra:

Fewer edge cases than maintaining STCS + LCS + TWCS
Consistent behavior across configurations
Easier to test and maintain
Bug fixes benefit all configurations

Smooth Migration¶

Transitioning from traditional strategies is straightforward:

ALTER TABLE to enable UCS immediately
Existing SSTables remain valid
Gradual transition as compaction runs
No major compaction required

Drawbacks¶

Newer Strategy¶

UCS has less production history than traditional strategies:

Introduced in Cassandra 5.0 (2023)
Less community experience with edge cases
Fewer tuning guides and best practices available
Some workloads may have undiscovered issues

Learning Curve¶

Different conceptual model requires adjustment:

Shards and runs vs levels and tiers
Density-based vs count-based triggering
New configuration parameters to understand
Existing STCS/LCS expertise doesn't directly transfer

Time-Series Trade-offs¶

TWCS may still be preferable for pure time-series:

TWCS drops entire SSTables on TTL expiry (O(1))
UCS must compact to remove expired data
Window-based organization provides better locality

Shard Overhead¶

Sharding adds some overhead:

Minimum SSTable size per shard
More metadata to track
Very small tables may not benefit
Configuration requires understanding data size

Major Compaction Behavior¶

Major Compaction May Not Parallelize as Expected

The official documentation states that major compaction under UCS results in base_shard_count concurrent compaction tasks, each containing SSTables from one shard. However, community testing has observed that nodetool compact may initiate a single compaction task containing all SSTables in the table, rather than parallel per-shard compactions.

This behavior has been observed with configurations such as:

'base_shard_count': '4',
'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy',
'scaling_parameters': 'T4'

When planning maintenance windows that rely on parallel major compaction, verify actual behavior in the target environment before assuming parallel execution.

Migration Considerations¶

While migration is smooth, considerations exist:

Existing tuning may not transfer directly
Monitoring dashboards need updates
Operational procedures may need revision
Testing required before production migration

When to Use UCS¶

Ideal Use Cases¶

Workload Pattern	Configuration	Why UCS Works
New Cassandra 5.0+ clusters	T4 (default)	Modern, unified approach
Mixed read/write workloads	T4	Balanced trade-offs
High-core-count servers	base_shard_count=16	Parallel compaction
Workloads that evolve	T4, then adjust	Easy reconfiguration
Large datasets	T4, target_sstable_size=5GiB	Efficient compaction

Avoid UCS When¶

Workload Pattern	Alternative	Rationale
Pure time-series with TTL	TWCS	More efficient TTL handling
Cassandra < 5.0	STCS/LCS	UCS not available
Well-tuned existing cluster	Keep current	Migration has risk
Very small tables	STCS	Shard overhead not justified

Sharding¶

UCS divides the token range into shards, enabling:

Parallel compaction: Different shards compact independently
Reduced compaction scope: Smaller units of work
Better resource utilization: Multiple CPU cores used effectively

Token range: -2^63 to 2^63
With 4 base shards:

Shard 0: tokens -2^63 to -2^61
Shard 1: tokens -2^61 to 0
Shard 2: tokens 0 to 2^61
Shard 3: tokens 2^61 to 2^63

Each shard compacts independently with its own SSTable hierarchy

Shard Count Calculation¶

The number of shards scales dynamically with data density using a four-case formula:

\[S = \begin{cases} 1 & \text{if } d < m \\ \min\left(2^{\lfloor \log_2 \frac{d}{m} \rfloor}, b\right) & \text{if } d < m \cdot b \\ b & \text{if } d < t \cdot b \\ 2^{\lfloor (1-\lambda) \cdot \log_2 (\frac{d}{t} \cdot \frac{1}{b}) \rfloor} \cdot b & \text{otherwise} \end{cases}\]

Where:

\(S\) = number of shards
\(d\) = density (data size / token fraction)
\(m\) = min_sstable_size (default: 100 MiB)
\(b\) = base_shard_count (default: 4)
\(t\) = target_sstable_size (default: 1 GiB)
\(\lambda\) = sstable_growth (default: 0.333)

Case breakdown:

Very small data (\(d < m\)): Single shard, no splitting
Small data (\(d < m \cdot b\)): Shard count grows up to base count
Medium data (\(d < t \cdot b\)): Fixed at base shard count
Large data: Shard count doubles as density increases, modulated by growth factor \(\lambda\)

This ensures power-of-two boundaries for efficient token range splitting while preventing over-sharding of small datasets.

Growth Component (\(\lambda\)) Effects¶

The sstable_growth parameter (\(\lambda\)) controls the trade-off between increasing shard count vs. increasing SSTable size:

\(\lambda\) Value	Shard Growth	SSTable Size	Use Case
0	Grows with density	Fixed at target	Many small SSTables, maximum parallelism
0.333 (default)	Cubic-root growth	Square-root growth	Balanced: both grow moderately
0.5	Square-root growth	Square-root growth	Equal growth for both
1	Fixed at base count	Grows with density	Fewer large SSTables, less parallelism

Detailed effects:

\(\lambda = 0\): Shard count grows proportionally with density; SSTable size stays fixed at target_sstable_size
\(\lambda = 0.333\): When density quadruples, SSTable size grows by \(\sqrt[3]{4} \approx 1.6\times\) and shard count grows by \(4/1.6 \approx 2.5\times\)
\(\lambda = 0.5\): When density quadruples, both SSTable size and shard count double
\(\lambda = 1\): Shard count fixed at base_shard_count; SSTable size grows linearly with density

Output SSTable Sizing¶

Compaction output SSTables target sizes between:

\[\frac{s_t}{\sqrt{2}} \leq \text{output\_size} \leq s_t \times \sqrt{2}\]

Where \(s_t\) = target_sstable_size.

With default 1 GiB target:

Minimum: \(\frac{1024}{\sqrt{2}} \approx 724\) MiB
Maximum: \(1024 \times \sqrt{2} \approx 1448\) MiB

SSTables are split at predefined power-of-two shard boundaries, ensuring consistent boundaries across all density levels.

Configuration¶

CREATE TABLE my_table (
    id uuid PRIMARY KEY,
    data text
) WITH compaction = {
    'class': 'UnifiedCompactionStrategy',

    -- Scaling parameter (strategy behavior)
    -- T = Tiered (STCS-like), number is fanout
    -- L = Leveled (LCS-like), number is fanout
    -- N = None (no compaction)
    'scaling_parameters': 'T4',

    -- Target SSTable size
    'target_sstable_size': '1GiB',

    -- Base shard count
    'base_shard_count': 4,

    -- Minimum SSTable size (prevents over-sharding small data)
    'min_sstable_size': '100MiB'
};

Configuration Parameters¶

UCS-Specific Options¶

Parameter	Default	Description
`scaling_parameters`	T4	Strategy behavior: T (tiered), L (leveled), N (balanced) with fanout. Controls the read/write trade-off. Multiple comma-separated values can specify different behavior per level.
`target_sstable_size`	1 GiB	Target size for output SSTables. Actual sizes may vary between √0.5 and √2 times this value based on sharding calculations.
`base_shard_count`	4	Base number of token range shards. Actual shard count scales with data density using the `sstable_growth` modifier. Must be a power of 2.
`min_sstable_size`	100 MiB	Minimum SSTable size before sharding applies. Data below this threshold is not split into shards.
`sstable_growth`	0.333	Controls how shard count grows with data density. Range 0-1. Value of 0 maintains fixed target size; 1 prevents splitting beyond base count; default 0.333 creates cubic-root growth.
`flush_size_override`	0	Override for expected flush size. When 0, derived automatically from observed flush operations (rounded to whole MB).
`max_sstables_to_compact`	0	Maximum SSTables per compaction. Value of 0 defaults to Integer.MAX_VALUE (effectively unlimited).
`expired_sstable_check_frequency_seconds`	600	How often to check for fully expired SSTables that can be dropped.
`unsafe_aggressive_sstable_expiration`	false	Drop SSTables without tombstone checking. Same semantics as TWCS.
`overlap_inclusion_method`	TRANSITIVE	How to identify overlapping SSTables. TRANSITIVE uses transitive closure; alternatives available for specific use cases.
`parallelize_output_shards`	true	Enable parallel output shard writing during compaction. Improves throughput on multi-core systems.
`survival_factor`	1	Multiplier for calculating space overhead during compaction. Higher values provide more headroom for concurrent compactions.

Common Compaction Options¶

Parameter	Default	Description
`enabled`	true	Enables background compaction.
`tombstone_threshold`	0.2	Ratio of droppable tombstones that triggers single-SSTable compaction.
`tombstone_compaction_interval`	86400	Minimum seconds between tombstone compaction attempts.
`unchecked_tombstone_compaction`	false	Bypasses tombstone compaction eligibility pre-checking.
`only_purge_repaired_tombstones`	false	Only purge tombstones from repaired SSTables.
`log_all`	false	Enables detailed compaction logging.

Validation Rules¶

The following constraints are enforced during configuration:

Parameter	Constraint
`target_sstable_size`	Minimum 1 MiB
`min_sstable_size`	Must be < `target_sstable_size × √0.5` (approximately 70% of target)
`base_shard_count`	Must be positive integer
`sstable_growth`	Must be between 0.0 and 1.0 (inclusive)
`flush_size_override`	If specified, minimum 1 MiB
`expired_sstable_check_frequency_seconds`	Must be positive
`scaling_parameters`	Must match format: N, L[2+], T[2+], or signed integer

Scaling Parameters¶

The scaling_parameters option controls compaction behavior through an internal scaling factor W.

Format¶

[T|L|N][number]

T = Tiered (STCS-like behavior)
L = Leveled (LCS-like behavior)
N = Balanced (neutral, neither tiered nor leveled)

number = threshold/fanout (must be ≥ 2 for L and T)

Internal Calculation¶

The scaling parameter is converted to an internal value \(W\):

\[W = \begin{cases} n - 2 & \text{for } T_n \text{ (e.g., } T_4 \rightarrow W = 2\text{)} \\ 2 - n & \text{for } L_n \text{ (e.g., } L_{10} \rightarrow W = -8\text{)} \\ 0 & \text{for } N \text{ (balanced)} \end{cases}\]

From \(W\), the fanout (\(f\)) and threshold (\(t\)) are calculated:

\[f = \begin{cases} 2 - W & \text{if } W < 0 \\ 2 + W & \text{otherwise} \end{cases}\]

\[t = \begin{cases} 2 & \text{if } W \leq 0 \\ 2 + W & \text{otherwise} \end{cases}\]

Examples¶

Parameter	\(W\)	\(f\)	\(t\)	Behavior
`T4`	2	4	4	Tiered: 4 SSTables trigger compaction, \(4\times\) size growth
`T8`	6	8	8	Very tiered: higher threshold, more write-optimized
`N`	0	2	2	Balanced: minimal fanout and threshold
`L4`	-2	4	2	Leveled: low threshold, more read-optimized
`L10`	-8	10	2	Very leveled: \(10\times\) fanout, aggressive compaction

Per-Level Configuration¶

Multiple values can be specified for different levels:

'scaling_parameters': 'T4, T4, L10'
-- Level 0: T4 behavior
-- Level 1: T4 behavior
-- Level 2+: L10 behavior (last value repeats)

This enables hybrid strategies where upper levels are tiered (write-optimized) and lower levels are leveled (read-optimized).

Migration from Other Strategies¶

From STCS¶

-- STCS with min_threshold=4 equivalent
ALTER TABLE keyspace.table WITH compaction = {
    'class': 'UnifiedCompactionStrategy',
    'scaling_parameters': 'T4'
};

From LCS¶

-- LCS equivalent
ALTER TABLE keyspace.table WITH compaction = {
    'class': 'UnifiedCompactionStrategy',
    'scaling_parameters': 'L10',
    'target_sstable_size': '160MiB'
};

Migration Process¶

UCS takes effect immediately on ALTER
Existing SSTables are not rewritten
New compactions follow UCS rules
Gradual transition as old SSTables compact

# Monitor migration
watch 'nodetool compactionstats && nodetool tablestats keyspace.table'

Tuning UCS¶

Write-Heavy Workload¶

ALTER TABLE keyspace.table WITH compaction = {
    'class': 'UnifiedCompactionStrategy',
    'scaling_parameters': 'T8',          -- Higher threshold, less frequent compaction
    'target_sstable_size': '2GiB',       -- Larger SSTables
    'base_shard_count': 4
};

Read-Heavy Workload¶

ALTER TABLE keyspace.table WITH compaction = {
    'class': 'UnifiedCompactionStrategy',
    'scaling_parameters': 'L10',         -- Leveled for low read amp
    'target_sstable_size': '256MiB',     -- Smaller SSTables for faster compaction
    'base_shard_count': 8                -- More parallelism
};

Large Dataset¶

ALTER TABLE keyspace.table WITH compaction = {
    'class': 'UnifiedCompactionStrategy',
    'scaling_parameters': 'T4',
    'target_sstable_size': '5GiB',       -- Larger to reduce SSTable count
    'base_shard_count': 16               -- More shards for parallelism
};

High-Core Server¶

ALTER TABLE keyspace.table WITH compaction = {
    'class': 'UnifiedCompactionStrategy',
    'scaling_parameters': 'T4',
    'base_shard_count': 16,              -- Match available cores
    'target_sstable_size': '1GiB'
};

Monitoring UCS¶

Key Metrics¶

Metric	Healthy	Investigate
Pending compactions	<20	>50
SSTable count per shard	Consistent	Highly variable
Compaction throughput	Steady	Dropping

Commands¶

# Compaction activity
nodetool compactionstats

# Table statistics
nodetool tablestats keyspace.table

# JMX metrics for detailed shard info
# org.apache.cassandra.metrics:type=Table,name=*

JMX Metrics¶

# Pending compactions
org.apache.cassandra.metrics:type=Compaction,name=PendingTasks

# Per-table compaction metrics
org.apache.cassandra.metrics:type=Table,keyspace=*,scope=*,name=PendingCompactions
org.apache.cassandra.metrics:type=Table,keyspace=*,scope=*,name=LiveSSTableCount

Comparison with Traditional Strategies¶

Aspect	STCS	LCS	TWCS	UCS
Write amplification	Low	High	Low	Configurable
Read amplification	High	Low	Low	Configurable
Space amplification	Medium	Low	Low	Low
Parallelism	Limited	Limited	Limited	High (sharding)
Adaptability	Fixed	Fixed	Fixed	Configurable
Cassandra version	All	All	3.0+	5.0+

UCS Advantages¶

Single strategy for multiple workloads: Adjust scaling_parameters instead of switching strategies
Better parallelism: Sharding enables multi-core utilization
Density-based triggering: More intelligent than fixed thresholds
Unified codebase: Simplified maintenance and fewer edge cases

UCS Considerations¶

Newer strategy: Less production experience than STCS/LCS
Migration complexity: Existing clusters need careful transition
Learning curve: Different configuration model

Overlap Inclusion Methods¶

The overlap_inclusion_method parameter controls how UCS extends the set of SSTables selected for compaction:

TRANSITIVE (Recommended)¶

When an SSTable is selected for compaction, all transitively overlapping SSTables are included:

If A overlaps B and B overlaps C:
  → A, B, and C are all included in the same compaction
  (even if A and C don't directly overlap)

Algorithm: O(n log n) time complexity
Forms minimal disjoint lists where overlapping SSTables belong together

This is the default and recommended setting. It ensures compaction fully resolves overlaps, reducing future read amplification.

SINGLE (Experimental)¶

Experimental

SINGLE is implemented for experimentation purposes only and is not recommended for production use with UCS sharding.

Extension occurs only once from the initially selected SSTables:

If A is selected and overlaps B and C:
  → A, B, and C are included
But if B also overlaps D:
  → D is NOT included (no transitive extension)

Use this when compaction scope needs to be limited, at the cost of potentially leaving some overlaps unresolved.

NONE (Experimental)¶

Experimental

NONE is implemented for experimentation purposes only and is not recommended for production use with UCS sharding.

No overlap extension occurs. Only directly selected SSTables are compacted:

Only the initially selected bucket is compacted
Overlapping SSTables in other buckets are not included

Not recommended for most use cases as it may leave significant overlaps, degrading read performance.

Implementation Internals¶

This section documents implementation details from the Cassandra source code.

Level Structure¶

UCS supports up to 32 levels (MAX_LEVELS = 32), sufficient for petabytes of data. This accommodates 2^32 SSTables at minimum 1MB each.

Shard Management¶

The ShardManager organizes compaction across token ranges:

Boundary calculation: Token range divided into base_shard_count segments
Density tracking: Each shard tracks data density for triggering decisions
Dynamic scaling: Shard count grows based on sstable_growth modifier
Output sharding: Compaction output can be parallelized via ShardTracker

Compaction Candidate Selection¶

The selection algorithm prioritizes efficiency and coverage:

Selection process:
1. Filter unsuitable SSTables (suspect, early-open)
2. Exclude currently compacting SSTables
3. Organize remaining SSTables by density into levels (up to 32)
4. Form buckets by overlap relationships within each level
5. Find buckets with highest overlap count
6. Among equal candidates:
   - Prioritize LOWEST levels (cover larger token fraction for same work)
   - Random uniform selection within same level (reservoir sampling)
7. Extend selection using overlap_inclusion_method

Level prioritization rationale: Lower levels cover a larger fraction of the token space for the same amount of compaction work, making them more efficient targets.

Major Compaction Behavior¶

When major compaction is triggered:

Compacts all SSTables that have transitive overlap
Produces base_shard_count concurrent compaction tasks
Each task handles SSTables from one shard
Output is split at shard boundaries appropriate for resulting density

Output Sharding¶

When parallelize_output_shards is enabled:

Compaction tasks can write to multiple output shards simultaneously
Output SSTables are split at power-of-two shard boundaries
Task IDs use sequences 1+ for parallelized operations (sequence 0 for non-parallelized)
Improves throughput on multi-core systems

Overlap Set Formation¶

The overlap detection algorithm forms a minimal list of overlap sets satisfying three properties:

Non-overlapping SSTables never share a set
Overlapping SSTables exist together in at least one set
SSTables occupy consecutive positions within sets

Algorithm:
1. Sort SSTables by first token
2. Iterate through sorted list
3. Extend current set while SSTables overlap
4. Close set and start new when gap found
5. Result: minimal lists where all overlapping SSTables are grouped

Time complexity: O(n log n)

Example: If SSTables A, B, C, D cover tokens 0-3, 2-7, 6-9, 1-8 respectively: - Overlap sets computed: {A, B, D} and {B, C, D} - A and C don't overlap, so they're in separate sets - A, B, D overlap at token 2 → must be in at least one set together - B, C, D overlap at token 7 → must be in at least one set together

The overlap sets determine read amplification: any key lookup touches at most one SSTable per non-overlapping set.

Constants Reference¶

Constant	Value	Description
`MAX_LEVELS`	32	Maximum hierarchy depth
Default scaling_parameters	T4	Tiered with fanout/threshold of 4
Default target_sstable_size	1 GiB	Target output SSTable size
Default base_shard_count	4	Initial token range divisions
Default sstable_growth	0.333	Cubic-root density/shard growth
Default expired check	600 seconds	TTL expiration check frequency
Output size range	s_t/√2 to s_t×√2	Actual output SSTable size bounds

Compaction Overview - Concepts and strategy selection
Size-Tiered Compaction (STCS) - Traditional write-heavy strategy
Leveled Compaction (LCS) - Traditional read-heavy strategy
Time-Window Compaction (TWCS) - Optimized for time-series with TTL
Compaction Management - Tuning and troubleshooting

Unified Compaction Strategy (UCS)¶

Background and History¶

Origins¶

Design Motivation¶

Academic Foundation¶

How UCS Works in Theory¶

Core Concepts¶

Shard Structure¶

Tiered Mode (T)¶

Leveled Mode (L)¶

Density-Based Triggering¶

Level Assignment¶

Benefits¶

Configurable Trade-offs¶

Parallel Compaction¶

Bounded Read Amplification¶

Unified Codebase¶

Smooth Migration¶

Drawbacks¶

Newer Strategy¶

Learning Curve¶

Time-Series Trade-offs¶

Shard Overhead¶

Major Compaction Behavior¶

Migration Considerations¶

When to Use UCS¶

Ideal Use Cases¶

Avoid UCS When¶

Sharding¶

Shard Count Calculation¶

Growth Component (\(\lambda\)) Effects¶

Output SSTable Sizing¶

Configuration¶

Configuration Parameters¶

UCS-Specific Options¶

Common Compaction Options¶

Validation Rules¶

Scaling Parameters¶

Format¶

Internal Calculation¶

Examples¶

Per-Level Configuration¶

Migration from Other Strategies¶

From STCS¶

From LCS¶

Migration Process¶

Tuning UCS¶

Write-Heavy Workload¶

Read-Heavy Workload¶

Large Dataset¶

High-Core Server¶

Monitoring UCS¶

Key Metrics¶

Commands¶

JMX Metrics¶

Comparison with Traditional Strategies¶

UCS Advantages¶

UCS Considerations¶

Overlap Inclusion Methods¶

TRANSITIVE (Recommended)¶

SINGLE (Experimental)¶

NONE (Experimental)¶

Implementation Internals¶

Level Structure¶

Shard Management¶

Compaction Candidate Selection¶

Major Compaction Behavior¶

Output Sharding¶

Overlap Set Formation¶

Constants Reference¶

Related Documentation¶