nodetool pausehandoff¶
Pauses the delivery of hints to recovering nodes.
Synopsis¶
nodetool [connection_options] pausehandoff
Description¶
nodetool pausehandoff temporarily stops the delivery of stored hints to their target nodes. Unlike disablehandoff, this command does not prevent new hints from being stored—it only pauses the replay of existing hints.
This is useful when hint delivery is causing performance issues on recovering nodes or when controlled hint replay timing is needed.
Pause vs Disable
- pausehandoff: Stops hint delivery only. New hints continue to be stored.
- disablehandoff: Stops both hint storage AND delivery.
Behavior¶
When hint delivery is paused:
- New hints continue to be stored for unavailable replicas
- Existing hints are NOT delivered to recovered nodes
- Hints accumulate until delivery is resumed
- Hint expiration continues (hints may expire before delivery)
What Continues¶
| Feature | Status |
|---|---|
| New hint storage | Continues normally |
| Gossip | Continues normally |
| Writes | Continue normally |
| Read repairs | Continue normally |
What Stops¶
| Feature | Status |
|---|---|
| Hint delivery | Paused |
| Automatic consistency restoration | Suspended |
| Hint replay to recovered nodes | Stopped |
Examples¶
Basic Usage¶
nodetool pausehandoff
Pause and Verify¶
nodetool pausehandoff
# Check hint delivery threads
nodetool tpstats | grep -i hint
When to Use¶
Recovering Node Under Stress¶
When a recovering node is overwhelmed by hint replay:
# 1. Pause hint delivery
nodetool pausehandoff
# 2. Let the node stabilize
# Monitor: nodetool tpstats
# 3. Resume when ready
nodetool resumehandoff
Controlled Recovery Timing¶
When needing to control when hints are delivered:
# Pause during peak hours
nodetool pausehandoff
# Resume during maintenance window
# (later)
nodetool resumehandoff
Troubleshooting Hint Storms¶
When investigating hint-related performance issues:
# 1. Check hint delivery activity
nodetool tpstats | grep -i hint
# 2. Pause if hint delivery is causing issues
nodetool pausehandoff
# 3. Investigate
nodetool listpendinghints
nodetool tablestats system.hints
# 4. Resume when resolved
nodetool resumehandoff
During Node Maintenance¶
Prevent hint delivery to a node under maintenance:
# On coordinator nodes, pause hint delivery
nodetool pausehandoff
# Perform maintenance on target node
# ...
# Resume after maintenance
nodetool resumehandoff
Impact Assessment¶
Immediate Effects¶
| Aspect | Impact |
|---|---|
| Hint delivery | Stops immediately |
| Coordinator load | May increase (hints accumulate) |
| Target node load | Decreases (no hint replay) |
| Disk usage | Hints continue accumulating |
Consistency Implications¶
| Scenario | With Delivery | With Paused Delivery |
|---|---|---|
| Node recovers | Hints replayed immediately | Hints queued, not delivered |
| Multiple writes | Consistency restored quickly | Consistency delayed |
| Hint expiration | Normal | Hints may expire before delivery |
Hint Expiration Risk
While delivery is paused, hints continue to age. If paused too long, hints may expire (default: 3 hours) and be lost. Run repair to restore consistency if hints expire.
Workflow: Controlled Hint Delivery¶
#!/bin/bash
# controlled_hint_delivery.sh
echo "=== Controlled Hint Delivery ==="
# 1. Check current hint status
echo "1. Current pending hints:"
nodetool listpendinghints
# 2. Pause delivery
echo ""
echo "2. Pausing hint delivery..."
nodetool pausehandoff
# 3. Verify paused
echo ""
echo "3. Hint thread activity (should show no active delivery):"
nodetool tpstats | grep -i hint
# 4. Wait for user confirmation
echo ""
read -p "Press Enter to resume hint delivery..."
# 5. Resume delivery
echo ""
echo "5. Resuming hint delivery..."
nodetool resumehandoff
# 6. Monitor delivery
echo ""
echo "6. Monitoring hint delivery:"
watch -n 2 'nodetool tpstats | grep -i hint'
Monitoring While Paused¶
Track Hint Accumulation¶
# Check pending hints periodically
watch -n 60 'nodetool listpendinghints'
Monitor Hints Table Size¶
# Track disk usage by hints
watch -n 60 'nodetool tablestats system.hints | grep "Space used"'
Check Hint Age¶
#!/bin/bash
# Monitor hint accumulation while paused
echo "=== Hint Status (Delivery Paused) ==="
# Pending hints
echo "Pending hints by target:"
nodetool listpendinghints
# Hints table size
echo ""
echo "Hints table size:"
nodetool tablestats system.hints 2>/dev/null | grep -E "Space used|SSTable count"
# Warn about expiration
echo ""
hint_window=$(grep "max_hint_window_in_ms" /etc/cassandra/cassandra.yaml 2>/dev/null | awk '{print $2}')
if [ -n "$hint_window" ]; then
hours=$((hint_window / 3600000))
echo "Hint window: ${hours} hours"
echo "WARNING: Hints older than ${hours} hours will expire!"
fi
Risks and Mitigations¶
Risk: Hint Expiration¶
Cause: Hints expire while delivery is paused
Mitigation:
# Track pause duration
echo "Paused at: $(date)" >> /var/log/cassandra/hint_pause.log
# Set a reminder to resume
echo "nodetool resumehandoff" | at now + 2 hours
# After resuming, check if hints expired and run repair if needed
nodetool resumehandoff
nodetool repair -pr
Risk: Disk Space Exhaustion¶
Cause: Hints accumulate while paused
Mitigation:
# Monitor disk usage
df -h /var/lib/cassandra
# If critical, may need to truncate hints
nodetool truncatehints
# Then run repair to restore consistency
nodetool repair -pr
Risk: Forgot to Resume¶
Symptom: Recovered nodes remain inconsistent
Mitigation:
# Include in monitoring
#!/bin/bash
# check_hint_delivery.sh
active=$(nodetool tpstats 2>/dev/null | grep "HintedHandoff" | awk '{print $2}')
pending=$(nodetool listpendinghints 2>/dev/null | tail -n +2 | awk '{sum+=$2} END {print sum}')
if [ "$pending" -gt 0 ] && [ "$active" -eq 0 ]; then
echo "WARNING: Hints pending but delivery may be paused!"
echo "Pending hints: $pending"
fi
Cluster-Wide Operations¶
Pause on All Nodes¶
#!/bin/bash
# pause_handoff_cluster.sh
echo "Pausing hint delivery cluster-wide..."# Get list of node IPs from local nodetool status
nodes=$(nodetool status | grep "^UN" | awk '{print $2}')
for node in $nodes; do
echo -n "$node: "
ssh "$node" "nodetool pausehandoff 2>/dev/null && echo "paused" || echo "FAILED""
done
echo ""
echo "Verification (hint thread activity):"
for node in $nodes; do
echo "=== $node ==="
ssh "$node" "nodetool tpstats 2>/dev/null | grep -i hint"
done
echo ""
echo "REMEMBER: Resume with 'nodetool resumehandoff' on all nodes"
Resume on All Nodes¶
#!/bin/bash
# resume_handoff_cluster.sh
echo "Resuming hint delivery cluster-wide..."# Get list of node IPs from local nodetool status
nodes=$(nodetool status | grep "^UN" | awk '{print $2}')
for node in $nodes; do
echo -n "$node: "
ssh "$node" "nodetool resumehandoff 2>/dev/null && echo "resumed" || echo "FAILED""
done
echo ""
echo "Hint delivery resumed on all nodes."
Troubleshooting¶
Hints Still Being Delivered After Pause¶
# Verify pause was successful
nodetool tpstats | grep -i hint
# If still active, retry
nodetool pausehandoff
# Check for multiple coordinators
# Each coordinator node needs to pause separately
Cannot Pause¶
# Check JMX connectivity
nodetool info
# Check logs for errors
tail -100 /var/log/cassandra/system.log | grep -i hint
High Hint Accumulation While Paused¶
# Check pending hints
nodetool listpendinghints
# Check hints table size
nodetool tablestats system.hints
# If too many hints, consider:
# 1. Resume delivery to reduce backlog
# 2. Or truncate and repair if hints will expire anyway
Comparison: Pause vs Disable¶
| Aspect | pausehandoff | disablehandoff |
|---|---|---|
| New hint storage | Continues | Stops |
| Hint delivery | Stops | Stops |
| Coordinator disk impact | Continues growing | Stops growing |
| Recovery without repair | Yes (resume) | No (repair required) |
| Use case | Temporary relief | Emergency only |
Choose pausehandoff When:¶
- Recovering node is overwhelmed
- Need temporary performance relief
- Plan to resume delivery soon
- Want to preserve hints for later delivery
Choose disablehandoff When:¶
- Disk space emergency on coordinator
- Node will be down longer than hint window
- Need to completely stop hint-related activity
Best Practices¶
Pause Guidelines
- Prefer pause over disable - Pause preserves hints for later delivery
- Set time limits - Don't leave paused indefinitely (hints expire)
- Monitor accumulation - Watch hint count and disk usage while paused
- Document the reason - Log why and when you paused
- Plan for resume - Know when you'll resume delivery
- Consider hint window - Resume before hints start expiring
Pause Duration Limits
The default hint window is 3 hours. If delivery is paused longer than this, hints for down nodes will expire and be lost. Either:
- Resume delivery before hints expire
- Plan for repair after resuming
- Adjust
max_hint_window_in_msif longer pause needed
Related Commands¶
| Command | Relationship |
|---|---|
| resumehandoff | Resume hint delivery |
| enablehandoff | Enable hint storage |
| disablehandoff | Disable hint storage and delivery |
| statushandoff | Check handoff status |
| listpendinghints | List pending hints |
| truncatehints | Remove all hints |