nodetool relocatesstables¶
Relocates SSTables to the correct disk based on the configured disk allocation strategy.
Synopsis¶
nodetool [connection_options] relocatesstables [--jobs <jobs>] <keyspace> [tables...]
Description¶
nodetool relocatesstables moves SSTables to their correct disk location according to Cassandra's disk allocation strategy. When Cassandra is configured with multiple data directories (JBOD - Just a Bunch of Disks), it distributes SSTables across these disks. Over time, SSTables can end up on "wrong" disks due to compaction, streaming, or configuration changes. This command corrects those misplacements.
Multi-Disk Configuration Required
This command is only relevant when Cassandra is configured with multiple data directories in cassandra.yaml. With a single data directory, this command has no effect.
Why Multiple Data Directories?¶
Before understanding relocatesstables, it helps to understand why Cassandra uses multiple data directories:
JBOD (Just a Bunch of Disks) Architecture¶
# cassandra.yaml - Multiple data directories
data_file_directories:
- /mnt/disk1/cassandra/data
- /mnt/disk2/cassandra/data
- /mnt/disk3/cassandra/data
- /mnt/disk4/cassandra/data
Benefits of JBOD:
| Benefit | Description |
|---|---|
| Cost efficiency | No RAID overhead, use raw disk capacity |
| Parallel I/O | Multiple disks = more throughput |
| Failure isolation | One disk failure affects only some SSTables |
| Scalability | Easy to add more disks |
Disk Allocation Strategies¶
Cassandra decides which disk to use for new SSTables based on the configured strategy:
# cassandra.yaml
disk_access_mode: auto # or mmap, mmap_index_only, standard
How Cassandra places SSTables:
- By available space - Prefers disks with more free space
- By table - Tries to keep a table's SSTables together
- Round-robin - Distributes evenly when space is similar
When SSTables End Up on "Wrong" Disks¶
Several scenarios cause SSTables to be on incorrect disks:
Scenario 1: After Adding New Disks¶
When new disks are added to data_file_directories:
Before: 2 disks (disk1, disk2) - all SSTables here
After: 4 disks (disk1, disk2, disk3, disk4) - new disks empty
Problem: Unbalanced I/O - old disks overloaded, new disks idle
Scenario 2: After Disk Replacement¶
When a failed disk is replaced:
Before: disk1 (100 SSTables), disk2 (100 SSTables), disk3 (100 SSTables)
Event: disk2 fails, replaced with new disk2
After: disk1 (100), disk2 (0 - empty!), disk3 (100)
Problem: New disk2 gets no read traffic, wasted capacity
Scenario 3: After Compaction Anomalies¶
Compaction can create large SSTables on one disk when source SSTables were on multiple disks:
Before Compaction:
disk1: sstable_a (10GB), sstable_b (10GB)
disk2: sstable_c (10GB)
After Compaction:
disk1: sstable_merged (30GB) # All ended up on one disk!
disk2: (empty for this table)
Scenario 4: After Streaming/Repair¶
Streamed data during repair or bootstrap may land on whatever disk has space:
After heavy streaming:
disk1: 40% used
disk2: 95% used # Received most streamed data
disk3: 45% used
Scenario 5: After Configuration Changes¶
Changing data_file_directories order or disk allocation settings:
# Before
data_file_directories:
- /mnt/disk1/data # Table A SSTables here
- /mnt/disk2/data # Table B SSTables here
# After (order changed)
data_file_directories:
- /mnt/disk2/data # Cassandra now expects Table A here
- /mnt/disk1/data # Cassandra now expects Table B here
Arguments¶
| Argument | Description |
|---|---|
keyspace |
Target keyspace name (required) |
tables |
Optional: specific table names. If omitted, relocates all tables in the keyspace |
Options¶
| Option | Description | Default |
|---|---|---|
--jobs, -j |
Number of parallel relocation jobs | 0 (sequential) |
What Happens During Relocation¶
Process Overview¶
1. Cassandra identifies "misplaced" SSTables
└── Compares current disk to expected disk per allocation strategy
2. For each misplaced SSTable:
└── Copy SSTable files to correct disk
└── Update metadata to point to new location
└── Delete old SSTable files
3. Operation completes when all SSTables are correctly placed
Files Moved Per SSTable¶
Each SSTable consists of multiple files that are moved together:
my_table-abc123-Data.db # Actual data
my_table-abc123-Index.db # Partition index
my_table-abc123-Filter.db # Bloom filter
my_table-abc123-Statistics.db # SSTable metadata
my_table-abc123-Summary.db # Index summary
my_table-abc123-TOC.txt # List of components
# ... and others depending on version
Examples¶
Relocate All Tables in Keyspace¶
# Move all SSTables in my_keyspace to correct disks
nodetool relocatesstables my_keyspace
Relocate Specific Table¶
# Move only the users table
nodetool relocatesstables my_keyspace users
Relocate Multiple Tables¶
# Move specific tables
nodetool relocatesstables my_keyspace users orders products
Parallel Relocation (Faster)¶
# Use 4 parallel jobs for faster relocation
nodetool relocatesstables --jobs 4 my_keyspace
# Maximum parallelism (use with caution)
nodetool relocatesstables --jobs 8 my_keyspace large_table
Relocate All Keyspaces¶
#!/bin/bash
# relocate_all.sh - Relocate SSTables for all user keyspaces
for ks in $(nodetool tablestats 2>/dev/null | grep "Keyspace :" | awk '{print $3}' | grep -v "^system"); do
echo "Relocating keyspace: $ks"
nodetool relocatesstables $ks
done
Real-World Scenarios¶
Scenario A: Adding Storage Capacity¶
Situation: Cluster is running low on disk space. Added two new disks to each node.
# 1. Stop Cassandra
sudo systemctl stop cassandra
# 2. Update cassandra.yaml with new data directories
# data_file_directories:
# - /mnt/disk1/cassandra/data
# - /mnt/disk2/cassandra/data
# - /mnt/disk3/cassandra/data # NEW
# - /mnt/disk4/cassandra/data # NEW
# 3. Create directories with correct ownership
sudo mkdir -p /mnt/disk3/cassandra/data /mnt/disk4/cassandra/data
sudo chown -R cassandra:cassandra /mnt/disk3/cassandra /mnt/disk4/cassandra
# 4. Start Cassandra
sudo systemctl start cassandra
# 5. Wait for node to be fully up
sleep 60
# 6. Relocate SSTables to distribute across all disks
nodetool relocatesstables --jobs 2 my_keyspace
# 7. Verify distribution
du -sh /mnt/disk*/cassandra/data/my_keyspace/
Expected result:
Before relocation:
/mnt/disk1: 450GB
/mnt/disk2: 480GB
/mnt/disk3: 0GB (new)
/mnt/disk4: 0GB (new)
After relocation:
/mnt/disk1: 230GB
/mnt/disk2: 240GB
/mnt/disk3: 225GB
/mnt/disk4: 235GB
Scenario B: Disk Failure Recovery¶
Situation: disk2 failed and was replaced. Data was rebuilt via repair but new disk has less data.
# After repair completes, check disk usage
df -h /mnt/disk*/cassandra
# Output shows imbalance:
# /mnt/disk1: 85% used
# /mnt/disk2: 20% used (new disk)
# /mnt/disk3: 82% used
# Relocate to rebalance
nodetool relocatesstables --jobs 2 my_keyspace
# Monitor progress
watch 'df -h /mnt/disk*/cassandra'
Scenario C: Hot Disk Mitigation¶
Situation: One disk is experiencing high I/O because too many SSTables for hot tables ended up there.
# Check which disk has the hot table's SSTables
find /mnt/disk*/cassandra/data/my_keyspace/hot_table-* -name "*Data.db" | \
xargs -I {} dirname {} | sort | uniq -c
# Output:
# 45 /mnt/disk1/cassandra/data/my_keyspace/hot_table-abc123
# 5 /mnt/disk2/cassandra/data/my_keyspace/hot_table-abc123
# 8 /mnt/disk3/cassandra/data/my_keyspace/hot_table-abc123
# Disk1 has too many - relocate to distribute
nodetool relocatesstables my_keyspace hot_table
Scenario D: Post-Upgrade Cleanup¶
Situation: After Cassandra upgrade, SSTable locations may not match new allocation strategy.
# After upgrading Cassandra version
# Run relocate to ensure SSTables are where the new version expects them
nodetool upgradesstables my_keyspace
nodetool relocatesstables my_keyspace
Impact Assessment¶
Resource Usage¶
| Resource | Impact | Notes |
|---|---|---|
| Disk I/O | HIGH | Reads from source disk, writes to target disk |
| CPU | Low | Minimal processing, mostly I/O |
| Memory | Low | Buffered I/O |
| Network | None | Local operation only |
| Disk Space | Temporary 2x | Needs space for copy before delete |
Impact on Operations¶
| Operation | Impact |
|---|---|
| Reads | Minimal - SSTables remain readable during move |
| Writes | Minimal - New writes go to appropriate disk |
| Compaction | May compete for I/O |
| Streaming | May compete for I/O |
Temporary Disk Space
Relocation copies files before deleting originals. Ensure sufficient free space on target disks before running. As a rule of thumb, need at least 1 SSTable's worth of free space.
Monitoring Progress¶
Watch Disk Usage Change¶
# Monitor disk usage during relocation
watch -n 5 'df -h /mnt/disk*/cassandra/data'
Check Relocation Activity¶
# View active compaction/relocation tasks
nodetool compactionstats
# May show as "Relocate sstables" task type
Log File Monitoring¶
# Watch for relocation messages
tail -f /var/log/cassandra/system.log | grep -i "relocat"
Verify Distribution After¶
#!/bin/bash
# check_sstable_distribution.sh
KEYSPACE="$1"
TABLE="$2"
echo "=== SSTable Distribution for $KEYSPACE.$TABLE ==="
for dir in /mnt/disk*/cassandra/data; do
count=$(find "$dir/$KEYSPACE/$TABLE-"* -name "*Data.db" 2>/dev/null | wc -l)
size=$(du -sh "$dir/$KEYSPACE/$TABLE-"* 2>/dev/null | tail -1 | awk '{print $1}')
echo "$(dirname $dir | xargs basename): $count SSTables, ${size:-0} total"
done
Pre-Relocation Checklist¶
#!/bin/bash
# pre_relocate_check.sh
KEYSPACE="$1"
echo "=== Pre-Relocation Safety Check ==="
# 1. Check disk space on all data directories
echo ""
echo "1. Disk space (need room for temporary copies):"
df -h /mnt/disk*/cassandra/data
# 2. Check current SSTable distribution
echo ""
echo "2. Current SSTable distribution:"
for dir in /mnt/disk*/cassandra/data; do
disk=$(dirname $dir | xargs basename)
count=$(find "$dir/$KEYSPACE" -name "*Data.db" 2>/dev/null | wc -l)
echo " $disk: $count SSTables"
done
# 3. Check for running compactions
echo ""
echo "3. Active compactions (avoid running simultaneously):"
nodetool compactionstats | head -10
# 4. Check cluster health
echo ""
echo "4. Cluster status:"
nodetool status | grep -E "^UN|^DN"
echo ""
echo "=== Review above before proceeding ==="
Best Practices¶
Relocation Guidelines
- Check disk space first - Ensure target disks have sufficient free space
- Run during low-traffic periods - Relocation generates I/O
- Use --jobs carefully - Higher parallelism = more I/O load
- Verify after completion - Check SSTable distribution is balanced
- One node at a time - In production, relocate on nodes sequentially
- Monitor throughout - Watch disk I/O and space usage
Important Considerations
- Don't change data_file_directories while relocating - Can cause confusion
- Avoid during heavy operations - Don't run with repair, major compaction
- Test in staging first - Understand timing and impact
- Have disk space buffer - Need temporary space for file copies
When Relocation May Not Help
- Single data directory - Command has no effect
- Evenly distributed already - May just shuffle without benefit
- Different table sizes - Large tables may still dominate one disk
Troubleshooting¶
Relocation Not Moving Files¶
# Check if SSTables are actually misplaced
nodetool tablestats my_keyspace.my_table | grep -i "sstable"
# Verify data directories are configured
grep "data_file_directories" /etc/cassandra/cassandra.yaml -A 10
Running Out of Disk Space¶
# If relocation fails due to space
# 1. Check which disk is full
df -h /mnt/disk*/cassandra
# 2. Run compaction to reduce SSTable count first
nodetool compact my_keyspace my_table
# 3. Retry relocation
nodetool relocatesstables my_keyspace my_table
Relocation Taking Too Long¶
# Check progress
nodetool compactionstats
# If I/O bound, reduce parallelism
# Restart with --jobs 1 (sequential)
nodetool relocatesstables --jobs 1 my_keyspace
# Or pause and resume during maintenance window
Imbalance After Relocation¶
# If still imbalanced, may need to run compaction first
nodetool compact my_keyspace
# Then relocate again
nodetool relocatesstables my_keyspace
# Some imbalance is normal due to different table sizes
Configuration Reference¶
cassandra.yaml Settings¶
# Multiple data directories (required for relocatesstables)
data_file_directories:
- /mnt/disk1/cassandra/data
- /mnt/disk2/cassandra/data
- /mnt/disk3/cassandra/data
# Disk failure policy (affects JBOD behavior)
disk_failure_policy: stop # or ignore, stop_paranoid, best_effort, die
# Commit log separate from data (recommended)
commitlog_directory: /mnt/ssd/cassandra/commitlog
Disk Failure Policies¶
| Policy | Behavior |
|---|---|
stop |
Stop gossip and client transports, leaving node in cluster but unavailable |
die |
Shut down Cassandra completely |
ignore |
Disable failed disk, continue with remaining disks |
best_effort |
Like ignore, but log errors |
stop_paranoid |
Stop even for corrupt SSTables |
Related Commands¶
| Command | Relationship |
|---|---|
| datapaths | View configured data directories |
| tablestats | View table statistics including SSTable info |
| compact | May help before relocation |
| compactionstats | Monitor relocation progress |
| info | View node information including data directories |