nodetool setconcurrentcompactors¶
Sets the number of concurrent compactor threads.
Synopsis¶
nodetool [connection_options] setconcurrentcompactors <value>
Description¶
nodetool setconcurrentcompactors changes the number of threads available for concurrent compaction operations at runtime. Each compactor thread can process one compaction task independently, allowing multiple compactions to run simultaneously across different tables or SSTables.
What Is Compaction?
Compaction is Cassandra's background process that merges SSTables, removes deleted data (tombstones), and consolidates data for efficient reads. Without compaction, read performance degrades as the number of SSTables grows.
What Are Concurrent Compactors?¶
The Compaction Process¶
When data is written to Cassandra, it first goes to the memtable (in memory), then gets flushed to SSTables (on disk). Over time, multiple SSTables accumulate for each table. Compaction merges these SSTables:
How Concurrent Compactors Work¶
Each compactor is a thread that can process one compaction task at a time. With multiple compactors, Cassandra can run multiple compaction operations simultaneously:
Why This Matters¶
| Scenario | With Few Compactors | With More Compactors |
|---|---|---|
| Heavy write load | Compaction falls behind, SSTable count grows | Keeps up with writes |
| Many tables | Tables compete for compaction time | Multiple tables compacted simultaneously |
| Large SSTables | Single compaction blocks others | Parallel compactions continue |
| Read latency | Degrades as SSTables accumulate | Stays stable |
Arguments¶
| Argument | Description |
|---|---|
value |
Number of concurrent compactor threads (required). Must be a positive integer ≥ 1. |
Default Value Calculation¶
If not explicitly configured, Cassandra calculates the default based on:
concurrent_compactors = min(number_of_data_directories, number_of_cpu_cores)
| System Configuration | Default Compactors |
|---|---|
| 8 cores, 1 disk | 1 |
| 8 cores, 4 disks (JBOD) | 4 |
| 16 cores, 8 disks | 8 |
| 4 cores, 8 disks | 4 |
The rationale: - Disk-limited: Each disk can only do one compaction efficiently at a time - CPU-limited: Each compaction thread consumes CPU for data processing
Impact of Changing This Setting¶
Increasing Concurrent Compactors¶
Benefits:
| Aspect | Effect |
|---|---|
| Compaction throughput | Faster - more tasks processed in parallel |
| SSTable count | Lower - compaction keeps up with writes |
| Read latency | Improved - fewer SSTables to merge |
| Compaction backlog | Clears faster |
Costs:
| Resource | Impact |
|---|---|
| CPU usage | Increases - more threads doing work |
| Disk I/O | Increases - more parallel reads/writes |
| Memory | Slight increase - buffers per compaction |
| Read/write latency during compaction | May increase - resource contention |
Decreasing Concurrent Compactors¶
Benefits:
| Aspect | Effect |
|---|---|
| CPU usage | Lower - fewer active threads |
| Disk I/O | Lower - less parallel activity |
| Foreground operations | More resources available |
| Latency during compaction | More predictable |
Costs:
| Resource | Impact |
|---|---|
| Compaction throughput | Decreases - slower processing |
| SSTable accumulation | Risk increases - may fall behind |
| Read latency over time | May degrade - more SSTables |
| Pending compaction tasks | Grows - longer backlog |
When to Increase Compactors¶
Scenario 1: Compaction Backlog Growing¶
Symptoms:
- nodetool compactionstats shows many pending compactions
- SSTable count per table is growing over time
- Read latency slowly increasing
# Check for compaction backlog
nodetool compactionstats
# Sample output showing problem:
# pending tasks: 847
# - my_keyspace.my_table: 245
# - my_keyspace.events: 602
Solution:
# Check current compactors
nodetool getconcurrentcompactors
# Output: 2
# Increase to clear backlog
nodetool setconcurrentcompactors 6
# Monitor progress
watch -n 10 'nodetool compactionstats | head -20'
# After backlog clears, consider keeping higher or reducing
Scenario 2: High Write Throughput¶
Symptoms: - Heavy write workload (bulk loading, high ingestion rate) - SSTables accumulating faster than compaction can merge them - Write latency spikes during compaction
# During bulk load, temporarily increase compactors
nodetool setconcurrentcompactors 8
# Monitor compaction keeping up
watch 'nodetool tablestats my_keyspace.my_table | grep "SSTable count"'
# After load completes, restore normal value
nodetool setconcurrentcompactors 4
Scenario 3: Many Tables¶
Symptoms: - Cluster has dozens or hundreds of tables - Compaction spreads thin across all tables - Some tables have excessive SSTables
# More compactors allow parallel work on multiple tables
nodetool setconcurrentcompactors 8
Scenario 4: JBOD with Many Disks¶
Symptoms: - Multiple data directories configured - Disks are underutilized - Compaction appears slow despite available I/O capacity
# Check disk count
grep data_file_directories /etc/cassandra/cassandra.yaml
# Match compactors to disk count (or slightly less)
nodetool setconcurrentcompactors 6 # For 8 disks
When to Decrease Compactors¶
Scenario 1: High Latency During Compaction¶
Symptoms: - Read/write latency spikes when compaction is active - CPU consistently at 100% during compaction - Application timeouts during compaction periods
# Reduce to free resources for foreground operations
nodetool setconcurrentcompactors 2
# Combined with throughput limit for more control
nodetool setcompactionthroughput 64 # MB/s
Scenario 2: Resource-Constrained Nodes¶
Symptoms: - Small instances (2-4 CPU cores) - Limited memory - Single disk (not JBOD)
# Minimum compaction overhead
nodetool setconcurrentcompactors 1
Scenario 3: Prioritizing Foreground Operations¶
Symptoms: - During peak business hours - When running repairs or streaming - During rolling restart/upgrade
# Temporarily reduce compaction activity
nodetool setconcurrentcompactors 1
# After maintenance window, restore
nodetool setconcurrentcompactors 4
Examples¶
Check Current Setting¶
nodetool getconcurrentcompactors
Sample output:
Current concurrent compactors: 4
Increase for Backlog Recovery¶
# Double the compactors temporarily
nodetool setconcurrentcompactors 8
Reduce During Peak Hours¶
# Minimize compaction impact
nodetool setconcurrentcompactors 2
Set Based on Hardware¶
#!/bin/bash
# set_compactors_auto.sh
# Get CPU cores
cores=$(nproc)
# Get data directory count
disks=$(grep -A 10 "data_file_directories:" /etc/cassandra/cassandra.yaml | \
grep "^ *-" | wc -l)
# Calculate appropriate value
recommended=$((cores < disks ? cores : disks))
echo "CPU cores: $cores"
echo "Data directories: $disks"
echo "Recommended compactors: $recommended"
nodetool setconcurrentcompactors $recommended
Temporarily Boost for Maintenance¶
#!/bin/bash
# boost_compaction.sh
NORMAL_COMPACTORS=4
BOOST_COMPACTORS=8
echo "Current compaction stats:"
nodetool compactionstats | head -5
echo ""
echo "Boosting compactors from $NORMAL_COMPACTORS to $BOOST_COMPACTORS..."
nodetool setconcurrentcompactors $BOOST_COMPACTORS
echo ""
echo "Monitoring compaction (Ctrl+C when done)..."
watch -n 5 'nodetool compactionstats | head -10'
# When done, run:
# nodetool setconcurrentcompactors $NORMAL_COMPACTORS
Monitoring Impact¶
Before and After Metrics¶
#!/bin/bash
# monitor_compaction_change.sh
echo "=== Before Change ==="
echo "Concurrent compactors: $(nodetool getconcurrentcompactors)"
echo ""
echo "Compaction stats:"
nodetool compactionstats
echo ""
echo "System load:"
uptime
echo ""
echo "I/O stats (5 second sample):"
iostat -x 1 5 | tail -10
echo ""
echo "Record these values, make the change, then run again to compare."
Watch Compaction Progress¶
# Real-time compaction monitoring
watch -n 2 'nodetool compactionstats'
# With SSTable counts
watch -n 10 'echo "=== Compaction ===" && nodetool compactionstats | head -10 && echo "" && echo "=== SSTable Counts ===" && nodetool tablestats 2>/dev/null | grep -E "Table:|SSTable count"'
Check Resource Usage¶
# CPU usage by Cassandra
top -p $(pgrep -d, -f CassandraDaemon)
# I/O usage
iostat -x 2
# Compaction-specific metrics via JMX
nodetool tpstats | grep -i compaction
Configuration Reference¶
cassandra.yaml Setting¶
# cassandra.yaml
# Number of simultaneous compactions to allow
# Default: min(number of disks, number of cores)
concurrent_compactors: 4
Runtime vs Persistent Configuration¶
| Method | Persistence | Restart Required |
|---|---|---|
nodetool setconcurrentcompactors |
Until restart | No |
cassandra.yaml |
Permanent | Yes (for initial load) |
Best Practice
Use nodetool setconcurrentcompactors to test changes dynamically, then update cassandra.yaml once the optimal value is determined.
Related cassandra.yaml Settings¶
# Compaction throughput limit (MB/s per compactor)
compaction_throughput_mb_per_sec: 64
# Concurrent reads/writes during compaction
concurrent_compactors: 4
# For STCS: minimum threshold to trigger compaction
# For LCS: SSTable size target
# (Varies by compaction strategy)
Interaction with Other Settings¶
With compaction_throughput_mb_per_sec¶
The total compaction I/O is approximately:
Total I/O ≈ concurrent_compactors × compaction_throughput_mb_per_sec
| Compactors | Throughput per Compactor | Total I/O |
|---|---|---|
| 2 | 64 MB/s | ~128 MB/s |
| 4 | 64 MB/s | ~256 MB/s |
| 8 | 64 MB/s | ~512 MB/s |
To limit total compaction I/O:
# Allow more parallelism but limit each
nodetool setconcurrentcompactors 8
nodetool setcompactionthroughput 32 # 8 × 32 = 256 MB/s total
With Different Compaction Strategies¶
| Strategy | Compactor Impact |
|---|---|
| STCS | More compactors help with multiple concurrent merges |
| LCS | Important - many small compactions benefit from parallelism |
| TWCS | Moderate - time windows reduce concurrent needs |
| UCS | Varies by configuration |
Troubleshooting¶
Compaction Still Falling Behind¶
# Check if limit is compactors or throughput
nodetool compactionstats
# Look at: "Active compaction remaining time"
# If all compactors busy, increase count
nodetool getconcurrentcompactors
nodetool setconcurrentcompactors $(($(nodetool getconcurrentcompactors | grep -oP '\d+') + 2))
# If compactors not fully utilized, check throughput
nodetool getcompactionthroughput
High CPU During Compaction¶
# Reduce compactors
nodetool setconcurrentcompactors 2
# And/or reduce throughput per compactor
nodetool setcompactionthroughput 32
Compaction Not Starting¶
# Check if compaction is disabled
nodetool compactionstats
# Look for "Compaction is currently disabled"
# Enable if needed
nodetool enableautocompaction my_keyspace
# Check compactors > 0
nodetool getconcurrentcompactors
Setting Reverts After Restart¶
# Check cassandra.yaml
grep concurrent_compactors /etc/cassandra/cassandra.yaml
# Update configuration file
sudo sed -i 's/concurrent_compactors:.*/concurrent_compactors: 6/' /etc/cassandra/cassandra.yaml
# Or add if not present
echo "concurrent_compactors: 6" | sudo tee -a /etc/cassandra/cassandra.yaml
Inconsistent Settings Across Cluster¶
#!/bin/bash
# check_compactors_cluster.sh
echo "=== Concurrent Compactors Across Cluster ==="
# Get list of node IPs from local nodetool status
for node in $(nodetool status | grep "^UN" | awk '{print $2}'); do
value=$(ssh "$node" "nodetool getconcurrentcompactors" 2>/dev/null | grep -oP '\d+')
echo "$node: $value compactors"
done
Sizing Guidelines¶
By Hardware Profile¶
| Profile | CPU Cores | Disks | Recommended Compactors |
|---|---|---|---|
| Small (dev/test) | 2-4 | 1 | 1-2 |
| Medium | 8 | 1-2 | 2-4 |
| Large | 16 | 4 (JBOD) | 4-8 |
| Extra Large | 32+ | 8+ (JBOD) | 8-12 |
By Workload¶
| Workload Type | Recommended Approach |
|---|---|
| Read-heavy | Moderate compactors (keep SSTables low) |
| Write-heavy | Higher compactors (keep up with flushes) |
| Mixed | Balance based on monitoring |
| Bulk loading | Temporarily maximize, then reduce |
Rule of Thumb¶
Recommended compactors = min(CPU_cores, disk_count, 8)
- Rarely beneficial to exceed 8 compactors
- Single disk systems: 1-2 compactors usually sufficient
- Monitor and adjust based on actual performance
Best Practices¶
Concurrent Compactors Guidelines
- Start with defaults - Cassandra's auto-calculation is reasonable
- Monitor before changing - Understand current compaction behavior
- Change incrementally - Adjust by 1-2 at a time
- Watch resource usage - CPU and I/O impact
- Consider workload patterns - Different times may need different values
- Make permanent - Update cassandra.yaml once optimal value found
- Consistent across cluster - All nodes should have same setting
Cautions
- Don't exceed CPU cores - Diminishing returns and resource contention
- Single disk limitation - More compactors won't help with one disk
- Memory impact - Each compaction uses memory buffers
- I/O saturation - Can starve foreground operations
- Testing required - Impact varies by hardware and workload
When to Leave at Default
The auto-calculated default is appropriate when:
- Hardware is well-balanced (cores ≈ disks)
- Workload is steady (not bursty)
- Compaction is keeping up (low pending tasks)
- No latency issues during compaction
Related Commands¶
| Command | Relationship |
|---|---|
| getconcurrentcompactors | View current setting |
| compactionstats | Monitor compaction progress |
| setcompactionthroughput | Control I/O per compactor |
| getcompactionthroughput | View throughput limit |
| enableautocompaction | Enable compaction |
| disableautocompaction | Disable compaction |
| compact | Force manual compaction |
| tablestats | View SSTable counts |