nodetool rebuild_index¶
Rebuilds secondary indexes for a table.
Synopsis¶
nodetool [connection_options] rebuild_index <keyspace> <table> [index_name...]
Description¶
nodetool rebuild_index reconstructs one or more secondary indexes on a table by scanning all data in the base table and regenerating the index entries. This operation is necessary when indexes become corrupted, out of sync with the base data, or after certain recovery scenarios.
Critical: Index Unavailability During Rebuild
The index becomes unavailable for queries during the rebuild process. Any queries that rely on the index will either:
- Return incomplete/partial results
- Fall back to inefficient full table scans (if
ALLOW FILTERINGis used) - Fail with an error
Plan accordingly for production systems. See Application Impact for details.
Arguments¶
| Argument | Description |
|---|---|
keyspace |
Target keyspace name (required) |
table |
Target table name (required) |
index_name |
Optional: specific index names to rebuild. If omitted, rebuilds ALL indexes on the table |
What Happens During Index Rebuild¶
Understanding the rebuild process helps in planning and risk assessment:
Step-by-Step Process¶
1. Index Marked as Building
└── Index queries return incomplete results or fail
2. Full Table Scan Begins
└── Every SSTable is read sequentially
└── Each row's indexed column is extracted
3. New Index Entries Created
└── Index SSTables are written
└── Compaction may occur on index SSTables
4. Index Marked as Built
└── Index queries resume normal operation
└── New index is fully available
Timeline Visualization¶
Time ────────────────────────────────────────────────────────────►
│ Normal │◄──── Index Unavailable ────►│ Normal Operation │
│ Operation │ (Rebuild in Progress) │ (Rebuild Done) │
▲ ▲
│ │
rebuild_index Rebuild Complete
command (logged in system.log)
Application Impact¶
What Happens to Queries During Rebuild¶
| Query Type | Behavior During Rebuild |
|---|---|
| Query using the rebuilding index | Returns incomplete results or fails |
Query using ALLOW FILTERING |
Works but extremely slow (full scan) |
| Queries not using the index | Unaffected |
| Writes to the table | Continue normally, indexed in new index |
Error Messages Applications May See¶
# Common error during rebuild
Unable to execute query: Index 'my_index' is not ready for use
# Or queries may silently return partial results
# (more dangerous - appears to work but data is incomplete)
Application Preparation Checklist¶
Before running rebuild_index in production:
- [ ] Notify application teams - They need to handle index unavailability
- [ ] Implement fallback logic - Applications should handle missing index gracefully
- [ ] Consider circuit breakers - Disable features that depend on the index
- [ ] Plan maintenance window - If possible, rebuild during low-traffic periods
- [ ] Estimate rebuild time - See Duration Estimation
Duration Estimation¶
Rebuild time depends on several factors:
Factors Affecting Duration¶
| Factor | Impact |
|---|---|
| Table size (data volume) | Linear - 2x data = ~2x time |
| Number of SSTables | More SSTables = more I/O overhead |
| Disk speed | SSD vs HDD makes significant difference |
| Concurrent operations | Other compactions/repairs slow rebuild |
| Index type | SASI indexes take longer than regular 2i |
Rough Estimation Formula¶
Estimated Time ≈ (Table Size in GB) × (1-5 minutes per GB)
Examples:
- 10 GB table, SSD: ~10-20 minutes
- 100 GB table, SSD: ~2-4 hours
- 100 GB table, HDD: ~4-8 hours
- 1 TB table: ~1-2 days
Estimates Are Approximate
Actual times vary significantly based on hardware, cluster load, and data characteristics. Always test in a non-production environment first.
Checking Progress¶
# Watch rebuild progress (shows as "Secondary index build")
watch -n 10 'nodetool compactionstats'
# Check system log for progress
tail -f /var/log/cassandra/system.log | grep -i "index"
# Detailed progress
nodetool compactionstats | grep -A 5 "Secondary index"
Sample compactionstats output during rebuild:
pending tasks: 1
- Secondary index build my_keyspace my_table 45678901234 42%
Active compaction remaining time : 0h15m32s
Examples¶
Rebuild All Indexes on a Table¶
# Rebuilds every index on the table
nodetool rebuild_index my_keyspace users
# Equivalent to rebuilding: users_email_idx, users_status_idx, etc.
Rebuild Specific Index¶
# Rebuild only the email index
nodetool rebuild_index my_keyspace users users_email_idx
Rebuild Multiple Specific Indexes¶
# Rebuild two specific indexes
nodetool rebuild_index my_keyspace users users_email_idx users_status_idx
Find Index Names First¶
# List indexes on a table
cqlsh -e "DESCRIBE TABLE my_keyspace.users;" | grep INDEX
# Or query schema directly
cqlsh -e "SELECT index_name FROM system_schema.indexes
WHERE keyspace_name = 'my_keyspace' AND table_name = 'users';"
When to Use¶
Scenario 1: Index Returning Incorrect Results¶
Symptoms:
- Queries using the index return fewer results than expected
- Results don't match direct queries with
ALLOW FILTERING - Index appears to be missing data
# Verify index inconsistency
cqlsh -e "SELECT COUNT(*) FROM my_keyspace.users WHERE email = '[email protected]';"
# Returns 0
cqlsh -e "SELECT COUNT(*) FROM my_keyspace.users WHERE email = '[email protected]' ALLOW FILTERING;"
# Returns 5 (correct)
# Rebuild to fix
nodetool rebuild_index my_keyspace users users_email_idx
Scenario 2: After Restoring from Backup¶
Why needed: Snapshots include index SSTables, but they may not be consistent with the restored data state.
# After restoring a table from snapshot
nodetool refresh my_keyspace users
# Rebuild all indexes to ensure consistency
nodetool rebuild_index my_keyspace users
Scenario 3: After Node Replacement¶
Why needed: New node may have index entries without corresponding base data or vice versa.
# After replacing a node and streaming data
nodetool rebuild_index my_keyspace users
Scenario 4: Upgrading Index Type¶
Why needed: After changing from regular secondary index to SASI or SAI.
# After altering index configuration
nodetool rebuild_index my_keyspace users new_index_name
Scenario 5: Corruption Detected in Logs¶
Symptoms in logs:
ERROR - Corrupt index segment detected for users_email_idx
WARN - Index users_email_idx may be inconsistent
# Rebuild the corrupted index
nodetool rebuild_index my_keyspace users users_email_idx
When NOT to Use¶
Avoid These Situations
Don't use rebuild_index when:
- Index is working correctly - Rebuilding a healthy index wastes resources
- During peak traffic - Causes performance degradation and query failures
- Without notifying application teams - Will cause application errors
- On very large tables without estimation - Could take days
- When simpler fixes exist - Sometimes repair or compaction is sufficient
Safe Execution Workflow¶
Pre-Rebuild Checklist¶
#!/bin/bash
# pre_rebuild_check.sh
KEYSPACE="$1"
TABLE="$2"
INDEX="$3"
echo "=== Pre-Rebuild Safety Check ==="
# 1. Check table size
echo ""
echo "1. Table size estimation:"
nodetool tablestats $KEYSPACE.$TABLE | grep "Space used"
# 2. List indexes
echo ""
echo "2. Indexes on table:"
cqlsh -e "SELECT index_name FROM system_schema.indexes
WHERE keyspace_name = '$KEYSPACE' AND table_name = '$TABLE';"
# 3. Check current cluster load
echo ""
echo "3. Current cluster load:"
nodetool tpstats | head -20
# 4. Check pending compactions
echo ""
echo "4. Pending compactions:"
nodetool compactionstats
# 5. Estimate time
size_bytes=$(nodetool tablestats $KEYSPACE.$TABLE 2>/dev/null | grep "Space used (live)" | awk '{print $4}')
size_gb=$((size_bytes / 1073741824))
echo ""
echo "5. Estimated rebuild time: $((size_gb * 2)) - $((size_gb * 5)) minutes"
echo ""
echo "=== Review above before proceeding ==="
echo "Command to execute: nodetool rebuild_index $KEYSPACE $TABLE $INDEX"
Controlled Rebuild with Monitoring¶
#!/bin/bash
# safe_rebuild_index.sh
KEYSPACE="$1"
TABLE="$2"
INDEX="$3"
echo "=== Safe Index Rebuild ==="
echo "Target: $KEYSPACE.$TABLE ${INDEX:-'(all indexes)'}"
echo ""
# Confirm
read -p "This will make the index unavailable. Continue? (yes/no): " confirm
if [ "$confirm" != "yes" ]; then
echo "Aborted."
exit 1
fi
# Record start time
START_TIME=$(date +%s)
echo ""
echo "Started at: $(date)"
# Start rebuild
echo ""
echo "Initiating rebuild..."
nodetool rebuild_index $KEYSPACE $TABLE $INDEX &
REBUILD_PID=$!
# Monitor progress
echo ""
echo "Monitoring progress (Ctrl+C to stop monitoring, rebuild continues)..."
while kill -0 $REBUILD_PID 2>/dev/null; do
progress=$(nodetool compactionstats 2>/dev/null | grep -i "secondary index" | tail -1)
if [ -n "$progress" ]; then
echo "$(date '+%H:%M:%S') - $progress"
else
echo "$(date '+%H:%M:%S') - Waiting for rebuild to appear in compactionstats..."
fi
sleep 30
done
# Calculate duration
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))
echo ""
echo "=== Rebuild Complete ==="
echo "Duration: $((DURATION / 60)) minutes $((DURATION % 60)) seconds"
# Verify index is ready
echo ""
echo "Verifying index is ready..."
# Test query (modify for your schema)
echo "Run a test query to verify index is working"
Impact Assessment¶
Resource Usage During Rebuild¶
| Resource | Impact Level | Notes |
|---|---|---|
| Disk I/O | HIGH | Full table scan reads all SSTables |
| CPU | Moderate | Index entry creation and sorting |
| Memory | Low-Moderate | Buffers for reading and writing |
| Network | None | Local operation only |
| Disk Space | Temporary increase | Old + new index until compaction |
Impact on Other Operations¶
| Operation | Affected? | Details |
|---|---|---|
| Normal reads (no index) | Minimal | May compete for I/O |
| Normal writes | Minimal | Writes continue, new data indexed |
| Compaction | Slowed | Competes for resources |
| Repair | Should avoid | Don't run simultaneously |
| Backup/Snapshot | Can proceed | But indexes may be mid-rebuild |
Cluster-Wide Considerations¶
Must Run on Each Node¶
Node-Local Operation
rebuild_index only rebuilds indexes on the node where it's executed. For complete cluster-wide rebuild, run on every node.
#!/bin/bash
# rebuild_index_cluster.sh
KEYSPACE="$1"
TABLE="$2"
INDEX="$3"
DELAY_BETWEEN_NODES=300 # 5 minutes
echo "=== Cluster-Wide Index Rebuild ==="
echo "Target: $KEYSPACE.$TABLE ${INDEX:-'(all indexes)'}"# Get list of node IPs from local nodetool status
nodes=$(nodetool status | grep "^UN" | awk '{print $2}')
for node in $nodes; do
echo ""
echo "$(date): Starting rebuild on $node..."
ssh "$node" "nodetool rebuild_index $KEYSPACE $TABLE $INDEX"
# Wait for completion
while ssh "$node" "nodetool compactionstats 2>/dev/null | grep -qi "secondary index"; do"
echo " Waiting for rebuild to complete on $node..."
sleep 60
done
echo "$(date): Rebuild complete on $node"
# Wait before next node (optional, allows cache warming)
echo "Waiting ${DELAY_BETWEEN_NODES}s before next node..."
sleep $DELAY_BETWEEN_NODES
done
echo ""
echo "=== Cluster-wide rebuild complete ==="
Rolling vs Parallel Rebuild¶
| Approach | Pros | Cons |
|---|---|---|
| Rolling (one at a time) | Lower cluster impact, safer | Takes longer |
| Parallel (all at once) | Faster total time | Higher cluster stress, all indexes unavailable |
Recommendation: Use rolling rebuild for production systems.
Verification After Rebuild¶
Confirm Rebuild Completed¶
# Check logs for completion message
grep -i "index build complete\|finished building" /var/log/cassandra/system.log | tail -5
# Verify no pending index builds
nodetool compactionstats | grep -i "secondary index"
# Should return empty
Test Index Functionality¶
# Run a query that uses the index
cqlsh -e "TRACING ON; SELECT * FROM my_keyspace.users WHERE email = '[email protected]';"
# Verify the trace shows index usage, not full scan
# Look for: "Index scan" rather than "Seq scan"
Compare Results¶
# Query with index
cqlsh -e "SELECT COUNT(*) FROM my_keyspace.users WHERE email = '[email protected]';"
# Query with ALLOW FILTERING (bypasses index)
cqlsh -e "SELECT COUNT(*) FROM my_keyspace.users WHERE email = '[email protected]' ALLOW FILTERING;"
# Results should match
Troubleshooting¶
Rebuild Taking Too Long¶
# Check if rebuild is actually progressing
watch -n 30 'nodetool compactionstats | grep -i index'
# Check for resource contention
nodetool tpstats | grep -i blocked
# Consider pausing other operations
nodetool disableautocompaction my_keyspace my_table
# ... rebuild ...
nodetool enableautocompaction my_keyspace my_table
Rebuild Appears Stuck¶
# Check Cassandra logs for errors
tail -100 /var/log/cassandra/system.log | grep -i "error\|exception\|index"
# Check thread pools for issues
nodetool tpstats
# If truly stuck, may need to restart (use with caution)
# The rebuild will restart from the beginning
Index Still Returning Bad Results After Rebuild¶
# Verify rebuild completed on ALL nodes
for node in $(nodetool status | grep "^UN" | awk '{print $2}'); do
echo "Checking $node..."
ssh "$node" "nodetool compactionstats | grep -i "secondary index" || echo " No active rebuilds""
done
# If already complete everywhere, may need to drop and recreate
cqlsh -e "DROP INDEX my_keyspace.users_email_idx;"
cqlsh -e "CREATE INDEX users_email_idx ON my_keyspace.users (email);"
Out of Disk Space During Rebuild¶
# Rebuild creates new index files before removing old ones
# Need approximately 2x index size temporarily
# Check disk space
df -h /var/lib/cassandra
# If space is tight, rebuild indexes one at a time
# and run compaction between each
nodetool rebuild_index my_keyspace users idx1
nodetool compact my_keyspace users
nodetool rebuild_index my_keyspace users idx2
Best Practices¶
Index Rebuild Guidelines
- Estimate first - Know how long it will take before starting
- Communicate - Notify all stakeholders about index unavailability
- Maintenance window - Prefer low-traffic periods
- One at a time - Rebuild indexes sequentially, not in parallel
- Monitor throughout - Watch compactionstats and logs
- Verify after - Test queries to confirm index is working
- Document - Record when and why rebuilds were performed
Production Precautions
- Never rebuild without planning - Applications will see failures
- Test in staging first - Understand timing and behavior
- Have rollback plan - Know what to do if rebuild fails
- Consider alternatives - Sometimes repair or compaction is enough
- Check disk space - Need ~2x index size temporarily
Alternative Approaches
Before rebuilding, consider if these alternatives might help:
- Run repair - Fixes consistency issues that might affect index
- Run compaction - May resolve some index inconsistencies
- Drop and recreate - Sometimes faster for small tables
- SAI indexes - Consider migrating to Storage Attached Indexes (Cassandra 4.0+)
Related Commands¶
| Command | Relationship |
|---|---|
| compactionstats | Monitor rebuild progress |
| tablestats | Check table size before rebuild |
| repair | Alternative for some consistency issues |
| compact | May help with some index issues |
| disableautocompaction | Reduce resource contention |
| enableautocompaction | Re-enable after rebuild |