Skip to content

nodetool repair_admin

Manages and monitors repair sessions on the cluster.


Synopsis

nodetool [connection_options] repair_admin <list | cancel | cleanup> [options]

Description

nodetool repair_admin provides administrative control over repair sessions:

  • list: View active and recent repair sessions
  • cancel: Stop a running repair
  • cleanup: Clean up orphaned repair sessions (Cassandra 4.0+)

Essential for managing repairs in production environments.


Subcommands

list

List repair sessions.

nodetool repair_admin list [--all]
Option Description
--all Include completed repairs, not just active

cancel

Cancel a running repair.

nodetool repair_admin cancel <session_id>

cleanup (4.0+)

Clean up orphaned repair metadata.

nodetool repair_admin cleanup

Examples

List Active Repairs

nodetool repair_admin list

Output:

id                                   state        command          coordinator                  participants                 last_update
a1b2c3d4-e5f6-7890-abcd-ef1234567890 RUNNING      RANGE            192.168.1.101                192.168.1.101,192.168.1.102  2024-01-15T10:30:00Z
b2c3d4e5-f6a7-8901-bcde-f12345678901 RUNNING      VALIDATION       192.168.1.102                192.168.1.102,192.168.1.103  2024-01-15T10:31:00Z

List All Repairs (Including Completed)

nodetool repair_admin list --all

Shows history of repairs including completed and failed.

Cancel a Repair

nodetool repair_admin cancel a1b2c3d4-e5f6-7890-abcd-ef1234567890

Clean Up Orphaned Sessions (4.0+)

nodetool repair_admin cleanup

Output Fields

Repair Session Information

Field Description
id Unique session identifier (UUID)
state Current state (RUNNING, FAILED, COMPLETED)
command Repair type (RANGE, VALIDATION, SYNC)
coordinator Node coordinating the repair
participants Nodes involved in the repair
last_update Timestamp of last status update

Session States

State Description
RUNNING Repair is in progress
COMPLETED Repair finished successfully
FAILED Repair encountered an error

When to Use

Monitor Active Repairs

Before starting maintenance:

nodetool repair_admin list

Check if repairs are already running.

Diagnose Slow Repairs

nodetool repair_admin list --all

View repair history to identify patterns.

Cancel Stuck Repairs

When a repair is hung or needs to be stopped:

# Find the session ID
nodetool repair_admin list

# Cancel it
nodetool repair_admin cancel <session_id>

Before Topology Changes

Ensure no repairs are running before:

  • Decommission
  • Adding nodes
  • Major maintenance
nodetool repair_admin list
# Should show no RUNNING repairs

Canceling Repairs

When to Cancel

Cancel Considerations

Cancel repairs when:

  • Repair is stuck (no progress)
  • Emergency maintenance needed
  • Repair started by mistake
  • Repair is impacting production too heavily

How to Cancel

# Get session ID
nodetool repair_admin list

# Cancel the session
nodetool repair_admin cancel a1b2c3d4-e5f6-7890-abcd-ef1234567890

After Canceling

Canceled repairs leave data partially synchronized:

  1. Data already streamed remains in place
  2. Unprocessed ranges were not repaired
  3. Run repair again later to complete synchronization

Repair States Deep Dive

Running Repair

id                                   state        command
a1b2c3d4-...                        RUNNING      RANGE

Active repair - do not interfere unless necessary.

Failed Repair

id                                   state        command
b2c3d4e5-...                        FAILED       VALIDATION

Investigate Failures

Failed repairs indicate:

  • Node unreachable
  • Timeout occurred
  • Resource exhaustion
  • SSTable corruption

Check logs for details:

grep -i "repair.*failed\|repair.*error" /var/log/cassandra/system.log

Orphaned Sessions

Sessions that didn't clean up properly:

# Clean up orphaned metadata
nodetool repair_admin cleanup

Monitoring Best Practices

Before Starting Repair

# Check for existing repairs
nodetool repair_admin list

# Verify cluster health
nodetool status

During Repair

# Watch progress
watch -n 30 'nodetool repair_admin list'

# Monitor streaming
nodetool netstats

After Repair

# Verify completion
nodetool repair_admin list --all | grep COMPLETED

# Check for failures
nodetool repair_admin list --all | grep FAILED

Common Issues

Cannot Find Session ID

If repair_admin list shows no sessions but repair seems running:

  • Check other nodes (repair coordinator may be different)
  • Check nodetool netstats for streaming activity

Cancel Doesn't Work

If cancel doesn't stop the repair:

  1. Verify correct session ID
  2. Try from the coordinator node
  3. Check logs for errors
  4. May need to wait for current streaming to complete

Repairs Start Automatically

Unexpected repairs may be from:

  • Scheduled repair tools (AxonOps, Reaper)
  • Cron jobs
  • Application-triggered repairs

Check all potential sources.


Integration with Repair Tools

AxonOps

AxonOps manages repairs automatically:

# View repairs including those managed by AxonOps
nodetool repair_admin list --all

Repairs started by AxonOps appear with distinctive session IDs.

Cassandra Reaper

If using Reaper for repair management:

  • Repairs appear in repair_admin list
  • Cancel through Reaper UI when possible
  • Use repair_admin cancel only if needed

Scripting Examples

Check for Active Repairs

#!/bin/bash
if nodetool repair_admin list | grep -q RUNNING; then
    echo "Repairs are running"
    exit 1
else
    echo "No active repairs"
    exit 0
fi

Cancel All Repairs

#!/bin/bash
# Emergency: cancel all running repairs
for session in $(nodetool repair_admin list | grep RUNNING | awk '{print $1}'); do
    echo "Canceling $session"
    nodetool repair_admin cancel "$session"
done

Monitor Repair Progress

#!/bin/bash
while true; do
    clear
    echo "=== Repair Status $(date) ==="
    nodetool repair_admin list
    echo ""
    echo "=== Network Activity ==="
    nodetool netstats | head -20
    sleep 30
done

Command Relationship
repair Start repair operations
netstats Monitor streaming during repair
status Check cluster state
tpstats Monitor repair thread pools
compactionstats Validation compactions during repair