nodetool repair_admin¶

Manages and monitors repair sessions on the cluster.

Synopsis¶

nodetool [connection_options] repair_admin <list | cancel | cleanup> [options]

Description¶

nodetool repair_admin provides administrative control over repair sessions:

list: View active and recent repair sessions
cancel: Stop a running repair
cleanup: Clean up orphaned repair sessions (Cassandra 4.0+)

Essential for managing repairs in production environments.

Subcommands¶

list¶

List repair sessions.

nodetool repair_admin list [--all]

Option	Description
`--all`	Include completed repairs, not just active

cancel¶

Cancel a running repair.

nodetool repair_admin cancel <session_id>

cleanup (4.0+)¶

Clean up orphaned repair metadata.

nodetool repair_admin cleanup

Examples¶

List Active Repairs¶

nodetool repair_admin list

Output:

id                                   state        command          coordinator                  participants                 last_update
a1b2c3d4-e5f6-7890-abcd-ef1234567890 RUNNING      RANGE            192.168.1.101                192.168.1.101,192.168.1.102  2024-01-15T10:30:00Z
b2c3d4e5-f6a7-8901-bcde-f12345678901 RUNNING      VALIDATION       192.168.1.102                192.168.1.102,192.168.1.103  2024-01-15T10:31:00Z

List All Repairs (Including Completed)¶

nodetool repair_admin list --all

Shows history of repairs including completed and failed.

Cancel a Repair¶

nodetool repair_admin cancel a1b2c3d4-e5f6-7890-abcd-ef1234567890

Clean Up Orphaned Sessions (4.0+)¶

nodetool repair_admin cleanup

Output Fields¶

Repair Session Information¶

Field	Description
id	Unique session identifier (UUID)
state	Current state (RUNNING, FAILED, COMPLETED)
command	Repair type (RANGE, VALIDATION, SYNC)
coordinator	Node coordinating the repair
participants	Nodes involved in the repair
last_update	Timestamp of last status update

Session States¶

State	Description
RUNNING	Repair is in progress
COMPLETED	Repair finished successfully
FAILED	Repair encountered an error

When to Use¶

Monitor Active Repairs¶

Before starting maintenance:

nodetool repair_admin list

Check if repairs are already running.

Diagnose Slow Repairs¶

nodetool repair_admin list --all

View repair history to identify patterns.

Cancel Stuck Repairs¶

When a repair is hung or needs to be stopped:

# Find the session ID
nodetool repair_admin list

# Cancel it
nodetool repair_admin cancel <session_id>

Before Topology Changes¶

Ensure no repairs are running before:

Decommission
Adding nodes
Major maintenance

nodetool repair_admin list
# Should show no RUNNING repairs

Canceling Repairs¶

When to Cancel¶

Cancel Considerations

Cancel repairs when:

Repair is stuck (no progress)
Emergency maintenance needed
Repair started by mistake
Repair is impacting production too heavily

How to Cancel¶

# Get session ID
nodetool repair_admin list

# Cancel the session
nodetool repair_admin cancel a1b2c3d4-e5f6-7890-abcd-ef1234567890

After Canceling¶

Canceled repairs leave data partially synchronized:

Data already streamed remains in place
Unprocessed ranges were not repaired
Run repair again later to complete synchronization

Repair States Deep Dive¶

Running Repair¶

id                                   state        command
a1b2c3d4-...                        RUNNING      RANGE

Active repair - do not interfere unless necessary.

Failed Repair¶

id                                   state        command
b2c3d4e5-...                        FAILED       VALIDATION

Investigate Failures

Failed repairs indicate:

Node unreachable
Timeout occurred
Resource exhaustion
SSTable corruption

Check logs for details:

grep -i "repair.*failed\|repair.*error" /var/log/cassandra/system.log

Orphaned Sessions¶

Sessions that didn't clean up properly:

# Clean up orphaned metadata
nodetool repair_admin cleanup

Monitoring Best Practices¶

Before Starting Repair¶

# Check for existing repairs
nodetool repair_admin list

# Verify cluster health
nodetool status

During Repair¶

# Watch progress
watch -n 30 'nodetool repair_admin list'

# Monitor streaming
nodetool netstats

After Repair¶

# Verify completion
nodetool repair_admin list --all | grep COMPLETED

# Check for failures
nodetool repair_admin list --all | grep FAILED

Common Issues¶

Cannot Find Session ID¶

If repair_admin list shows no sessions but repair seems running:

Check other nodes (repair coordinator may be different)
Check nodetool netstats for streaming activity

Cancel Doesn't Work¶

If cancel doesn't stop the repair:

Verify correct session ID
Try from the coordinator node
Check logs for errors
May need to wait for current streaming to complete

Repairs Start Automatically¶

Unexpected repairs may be from:

Scheduled repair tools (AxonOps, Reaper)
Cron jobs
Application-triggered repairs

Check all potential sources.

Integration with Repair Tools¶

AxonOps¶

AxonOps manages repairs automatically:

# View repairs including those managed by AxonOps
nodetool repair_admin list --all

Repairs started by AxonOps appear with distinctive session IDs.

Cassandra Reaper¶

If using Reaper for repair management:

Repairs appear in repair_admin list
Cancel through Reaper UI when possible
Use repair_admin cancel only if needed

Scripting Examples¶

Check for Active Repairs¶

#!/bin/bash
if nodetool repair_admin list | grep -q RUNNING; then
    echo "Repairs are running"
    exit 1
else
    echo "No active repairs"
    exit 0
fi

Cancel All Repairs¶

#!/bin/bash
# Emergency: cancel all running repairs
for session in $(nodetool repair_admin list | grep RUNNING | awk '{print $1}'); do
    echo "Canceling $session"
    nodetool repair_admin cancel "$session"
done

Monitor Repair Progress¶

#!/bin/bash
while true; do
    clear
    echo "=== Repair Status $(date) ==="
    nodetool repair_admin list
    echo ""
    echo "=== Network Activity ==="
    nodetool netstats | head -20
    sleep 30
done

Command	Relationship
repair	Start repair operations
netstats	Monitor streaming during repair
status	Check cluster state
tpstats	Monitor repair thread pools
compactionstats	Validation compactions during repair