nodetool repair¶

Runs anti-entropy repair to synchronize data across replicas, ensuring consistency and preventing data resurrection from expired tombstones.

Synopsis¶

nodetool [connection_options] repair [options] [--] [keyspace [table ...]]

Description¶

nodetool repair compares data between replica nodes using Merkle trees and streams any differences to ensure all replicas hold identical data. Repair is essential for:

Maintaining data consistency
Preventing tombstone resurrection (zombie data)
Recovering from node failures or network partitions

Comprehensive Repair Documentation

For detailed repair concepts, strategies, and scheduling guidance, see:

Repair Concepts - How repair works
Repair Options Reference - All options explained
Repair Strategies - Implementation approaches
Repair Scheduling - Planning repair cycles

Arguments¶

Argument	Description
`keyspace`	Keyspace to repair. Required for targeted repairs
`table`	Specific table(s) to repair. If omitted, repairs all tables

Key Options¶

Option	Description
`-pr, --partitioner-range`	Repair only primary range (recommended)
`--full`	Full repair instead of incremental
`-seq, --sequential`	Repair one node at a time
`--parallel`	Repair all replicas simultaneously (default in 4.0+)
`-dcpar, --dc-parallel`	Parallel within DC, sequential across DCs
`-dc, --in-dc`	Repair only within specified datacenter(s)
`-local, --in-local-dc`	Repair only within local datacenter
`-st, --start-token`	Start token for repair range
`-et, --end-token`	End token for repair range
`-j, --job-threads`	Number of repair job threads

Common Usage Patterns¶

Primary Range Repair (Recommended)¶

nodetool repair -pr my_keyspace

Always Use -pr

Without -pr, each node repairs all ranges it holds (primary + replica), causing redundant work. With -pr, run repair on every node to cover all ranges exactly once.

Full vs Incremental Repair¶

# Full repair (default before 4.0)
nodetool repair --full -pr my_keyspace

# Incremental repair (default in 4.0+)
nodetool repair -pr my_keyspace

Type	Behavior	Use Case
Full	Repairs all data	Recovery, initial sync
Incremental	Repairs only unrepaired data	Regular maintenance

Local Datacenter Only¶

nodetool repair -pr -local my_keyspace

Repairs only with replicas in the same datacenter.

Specific Token Range¶

nodetool repair -pr -st 0 -et 1000000000 my_keyspace

Repairs only the specified token range (subrange repair).

When to Use¶

Routine Maintenance¶

gc_grace_seconds Constraint

Repair must complete on all nodes within gc_grace_seconds (default 10 days) to prevent tombstone resurrection.

# Run on each node
nodetool repair -pr my_keyspace

After Node Recovery¶

After a node was down for extended time:

nodetool repair -pr my_keyspace

After Network Partition¶

If nodes were isolated:

nodetool repair -pr my_keyspace

Before Major Version Upgrade¶

Ensure consistency before upgrading:

nodetool repair --full my_keyspace

When NOT to Use¶

Repair Considerations

Avoid repair:

During high traffic - Significant resource impact
While streaming - Interferes with bootstrap/decommission
With down nodes - Repair will fail or skip ranges
Immediately after bulk load - Wait for compaction

Impact Analysis¶

Resource Usage¶

Resource	Impact
Network	High - streams data between nodes
Disk I/O	High - reads SSTables, writes repairs
CPU	Moderate - Merkle tree calculation
Memory	Merkle trees require heap space

Performance Impact¶

During repair, the following operations impact cluster performance:

Operation	Description
Merkle Tree Build	Computes hash trees for data comparison
Data Comparison	Compares trees between replicas
Data Streaming	Streams differing data between nodes

Expected impact during repair:

Metric	Impact
Read latency	+10-30%
Write latency	+5-15%
Network utilization	+20-50%

Monitoring Repair¶

Check Active Repairs¶

nodetool repair_admin list

Shows running repair sessions.

Monitor Progress¶

nodetool netstats

Shows streaming activity from repair.

Check Repair History¶

nodetool repair_admin list --all

Shows completed and failed repairs.

Abort Repair¶

nodetool repair_admin cancel <repair_id>

Canceling Repair

Canceled repairs leave data partially synchronized. Restart repair to complete synchronization.

Examples¶

Standard Maintenance Repair¶

# Run on each node sequentially
nodetool repair -pr my_keyspace

Repair Specific Table¶

nodetool repair -pr my_keyspace users

Parallel Repair (Faster)¶

nodetool repair -pr --parallel my_keyspace

Multi-DC Repair¶

# Repair with all DCs
nodetool repair -pr my_keyspace

# Repair specific DCs only
nodetool repair -pr -dc dc1 -dc dc2 my_keyspace

Verbose Output¶

nodetool repair -pr --trace my_keyspace

Common Issues¶

Repair Fails with Timeout¶

ERROR: Repair failed with error: Repair job timed out

Solutions: - Reduce repair scope (single table) - Use subrange repair - Increase streaming_socket_timeout_in_ms

Repair Session Already Running¶

ERROR: Repair session already in progress

Check and wait for existing repair:

nodetool repair_admin list

Out of Memory¶

ERROR: java.lang.OutOfMemoryError: Java heap space

Merkle trees consume heap. Solutions: - Reduce repair parallelism - Increase heap size - Use subrange repair

Inconsistent Data After Repair¶

If data still appears inconsistent: 1. Verify repair completed successfully 2. Check all nodes were repaired 3. Run nodetool repair --full for complete sync

Best Practices¶

Repair Guidelines

Use -pr flag - Prevents redundant work
Complete within gc_grace_seconds - Prevent zombies
One node at a time - For sequential strategy
Off-peak hours - Minimize production impact
Monitor progress - Watch for failures
Automate - Use AxonOps for scheduling

Repair Schedule Example¶

Cluster Size	Strategy	Frequency
3-6 nodes	Sequential	Weekly
6-20 nodes	Parallel	Every 3-5 days
20-50 nodes	DC-parallel	Every 2-3 days
50+ nodes	Continuous (AxonOps)	Always running

Command	Relationship
repair_admin	Manage repair sessions
netstats	Monitor streaming
status	Check node states before repair
scrub	Fix local SSTable corruption