Cluster Management Operations¶
This guide covers operational procedures for managing Cassandra cluster topology: adding capacity, removing nodes, replacing failed hardware, and maintaining cluster health.
No Single Point of Failure
Cassandra's peer-to-peer architecture enables online topology changes. Every node knows the cluster state through gossip, and token ranges determine data ownership. Topology changes trigger automatic data streaming in the background while the cluster continues serving traffic.
Understanding Topology Changes¶
How Cassandra Handles Topology Changes¶
When cluster membership changes, Cassandra automatically redistributes data:
Token Ownership and Streaming¶
| Scenario | Data Movement | Cleanup Required |
|---|---|---|
| Add node | Existing nodes stream data to new node | Yes, on existing nodes |
| Remove node (decommission) | Departing node streams to remaining nodes | No |
| Remove node (dead) | Remaining nodes stream among themselves | No |
| Replace node | Remaining nodes stream to replacement | No |
Always Run Cleanup After Adding Nodes
After adding nodes, existing nodes retain copies of data they no longer own. Run nodetool cleanup on each existing node to reclaim disk space. Schedule this during low-traffic periods as cleanup involves reading and rewriting SSTables.
Cluster Operations Overview¶
| Operation | When to Use | Command/Method | Impact |
|---|---|---|---|
| Add node | Scaling capacity | Start new node with proper config | Streaming ~hours |
| Decommission | Graceful removal (node up) | nodetool decommission |
Streaming ~hours |
| Remove node | Forced removal (node down) | nodetool removenode |
Streaming ~hours |
| Replace node | Hardware replacement | replace_address_first_boot |
Streaming ~hours |
| Move token | Rebalancing (rarely needed) | nodetool move |
Streaming |
| Cleanup | After adding nodes | nodetool cleanup |
I/O intensive |
| Assassinate | Last resort (stuck node) | nodetool assassinate |
Immediate |
Adding Nodes¶
Pre-flight Checklist¶
Before adding a node, verify:
- [ ] Same Cassandra version as existing cluster
- [ ] Same JDK version and vendor
- [ ] Adequate disk space (check existing node usage)
- [ ] Network connectivity to all existing nodes (storage port 7000, native port 9042)
- [ ] Firewall rules configured
- [ ] NTP synchronized across all nodes
- [ ] Seed nodes identified (use 2-3 existing stable nodes)
Adding a Single Node¶
Step 1: Prepare the new node
# Install Cassandra (same version as cluster)
# Configure cassandra.yaml
cluster_name: 'Production' # Must match existing cluster
num_tokens: 256 # Match existing nodes (or 16 for vnodes)
seeds: "10.0.1.1,10.0.1.2" # 2-3 existing nodes, NOT the new node
listen_address: 10.0.1.10 # This node's IP
rpc_address: 10.0.1.10 # Or 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: true # Default, ensures data streaming
Step 2: Configure rack and datacenter
# /etc/cassandra/cassandra-rackdc.properties
dc=dc1
rack=rack1
Step 3: Start the node
sudo systemctl start cassandra
# Monitor bootstrap progress
watch -n 5 'nodetool netstats | head -30'
# Check joining status
nodetool status
# New node shows UJ (Up, Joining) during bootstrap
Step 4: Verify bootstrap completion
# Node should show UN (Up, Normal)
nodetool status
# Verify data distribution
nodetool ring | grep <new_node_ip>
# Check no streaming in progress
nodetool netstats
Step 5: Run cleanup on existing nodes
# On EACH existing node (not the new one)
# Schedule during low-traffic period
nodetool cleanup
# Monitor progress
nodetool compactionstats
Adding Multiple Nodes¶
Never Bootstrap Multiple Nodes Simultaneously
Adding multiple nodes at once can overwhelm the cluster with streaming operations. Add nodes sequentially, waiting for each to complete bootstrap before starting the next.
Sequential addition procedure:
# Add first node
# Wait for bootstrap complete (UN status)
# Add second node
# Wait for bootstrap complete
# ... repeat ...
# Run cleanup on ALL original nodes after all additions complete
Recommended timing:
| Cluster Size | Wait Between Additions |
|---|---|
| < 10 nodes | 1-2 hours |
| 10-50 nodes | 2-4 hours |
| 50+ nodes | 4-8 hours or overnight |
Bootstrap Performance Tuning¶
For faster bootstrap (at cost of cluster performance):
# cassandra.yaml on NEW node only
stream_throughput_outbound_megabits_per_sec: 400 # Default 200
stream_entire_sstables: true # Cassandra 4.0+
# On existing nodes, temporarily increase streaming
nodetool setstreamthroughput 400
Troubleshooting Bootstrap¶
Bootstrap stuck or slow:
# Check streaming status
nodetool netstats
# Check for errors in logs
tail -f /var/log/cassandra/system.log | grep -i stream
# If stuck, check network connectivity
nc -zv <seed_node> 7000
Bootstrap failed:
# Clear data and retry
sudo systemctl stop cassandra
rm -rf /var/lib/cassandra/data/*
rm -rf /var/lib/cassandra/commitlog/*
rm -rf /var/lib/cassandra/saved_caches/*
sudo systemctl start cassandra
Removing Nodes¶
Graceful Decommission (Node is Healthy)¶
Use decommission when the node is up and responsive:
# On the node being removed
nodetool decommission
# Monitor streaming progress (from another node)
nodetool netstats
watch 'nodetool status'
# Node transitions: UN → UL (Leaving) → removed from ring
Decommission takes time based on data volume:
| Data per Node | Approximate Time |
|---|---|
| 100 GB | 1-2 hours |
| 500 GB | 4-8 hours |
| 1 TB+ | 8-24 hours |
Cannot Cancel Decommission
Once started, decommission cannot be safely cancelled. The node will stream all its data before leaving the ring. Plan accordingly.
Forced Removal (Node is Down)¶
When a node is unrecoverable and cannot be decommissioned:
# Get the host ID of the dead node
nodetool status
# Note the host ID (UUID) of the DN (Down, Normal) node
# Remove from any surviving node
nodetool removenode <host_id>
# Monitor progress
nodetool removenode status
# If removenode hangs (>1 hour with no progress)
nodetool removenode force <host_id>
When to use force:
removenodestuck for extended period- Other nodes show repeated connection failures to dead node
- Streaming progress at 0% for >30 minutes
Assassinate (Last Resort)¶
Use only when removenode fails completely:
# Only if removenode force doesn't work
nodetool assassinate <node_ip>
Assassinate Risks
Assassinate immediately removes the node from gossip without data redistribution. Use only when the node's data is recoverable through repair from other replicas.
Post-Removal Verification¶
# Verify node removed
nodetool status
# Dead node should not appear
# Check cluster health
nodetool describecluster
# Run full repair to ensure data consistency
nodetool repair -full
Replacing Nodes¶
Node replacement is used when hardware fails but the IP address or host identity needs to be preserved (or when replacing with new hardware).
Replace with Same IP Address¶
Step 1: Ensure dead node is recognized as down
# From any live node
nodetool status
# Dead node should show DN (Down, Normal)
Step 2: Prepare replacement node
# Same configuration as dead node
# cassandra.yaml: same settings
# Add JVM option for replacement
# In jvm.options or jvm-server.options (Cassandra 4.0+)
-Dcassandra.replace_address_first_boot=<dead_node_ip>
Step 3: Start replacement
sudo systemctl start cassandra
# Monitor replacement streaming
nodetool netstats
Step 4: Remove JVM option and restart
# After replacement complete (node shows UN)
# Remove the replace_address_first_boot option
# Restart is optional but recommended
sudo systemctl restart cassandra
Replace with Different IP Address¶
# Use replace_address_first_boot with the OLD (dead) node's IP
-Dcassandra.replace_address_first_boot=<dead_node_ip>
# Configure cassandra.yaml with NEW IP
listen_address: <new_node_ip>
rpc_address: <new_node_ip>
Replace with Different Host ID¶
Starting Cassandra 4.0, use host ID instead of IP:
# Get dead node's host ID
nodetool status
# On new node
-Dcassandra.replace_address_first_boot=<dead_node_host_id>
Replacement vs Removenode + Bootstrap¶
| Approach | Pros | Cons |
|---|---|---|
| Replace | Faster (streams from replicas) | Must match token count |
| Remove + Add | Clean slate | Two streaming operations |
Use replacement when:
- Hardware failure with recoverable scenario
- Same token configuration
- Need faster recovery
Use remove + add when:
- Changing token configuration
- Datacenter restructuring
- Clean start preferred
Scaling Operations¶
Scaling Up (Adding Capacity)¶
Calculate nodes needed:
Current: 6 nodes, 80% disk utilization
Target: 50% utilization for growth headroom
New node count = 6 × (80/50) = 9.6 → 10 nodes
Add 4 nodes
Scaling procedure:
- Add nodes one at a time (see Adding Nodes)
- Wait for each bootstrap to complete
- Run cleanup on original nodes after all additions
- Verify data distribution with
nodetool status
Scaling Down (Reducing Capacity)¶
Verify data fits on remaining nodes:
# Check current disk usage
nodetool status
# Ensure remaining nodes have capacity
# Rule: Post-removal utilization < 70%
Scale-down procedure:
- Decommission nodes one at a time
- Wait for each decommission to complete
- Verify data integrity after each removal
Replication Factor Constraint
The remaining cluster must have at least RF nodes per datacenter. Removing below RF makes writes impossible at QUORUM consistency.
Multi-Datacenter Operations¶
Adding a New Datacenter¶
Step 1: Configure replication for existing keyspaces
-- Update keyspace to include new DC (before adding nodes)
ALTER KEYSPACE my_keyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3,
'dc2': 3 -- New datacenter
};
Step 2: Add nodes to new datacenter
# cassandra-rackdc.properties on new nodes
dc=dc2
rack=rack1
# cassandra.yaml - use seeds from BOTH datacenters
seeds: "dc1-node1,dc1-node2,dc2-node1"
Step 3: Rebuild data in new datacenter
# On EACH new node in dc2
nodetool rebuild dc1
# This streams data from dc1 to populate dc2
Removing a Datacenter¶
Step 1: Update replication to exclude DC
ALTER KEYSPACE my_keyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3
-- dc2 removed
};
-- For ALL keyspaces including system_auth, system_distributed
Step 2: Decommission all nodes in the DC
# Decommission each node in dc2
nodetool decommission
Monitoring Topology Operations¶
Key Metrics During Topology Changes¶
| Metric | Source | Alert Threshold |
|---|---|---|
| Streaming progress | nodetool netstats |
Stalled >30 min |
| Pending compactions | nodetool compactionstats |
>100 sustained |
| Heap usage | JMX | >80% |
| Disk usage | OS | >85% |
Streaming Progress Monitoring¶
# Detailed streaming status
nodetool netstats
# Watch for progress
watch -n 30 'nodetool netstats | grep -A 20 "Streaming"'
# Estimate completion
# Progress shown as bytes transferred / total bytes
Health Verification Commands¶
# Cluster membership
nodetool status
# Schema agreement (should show single version)
nodetool describecluster
# Token distribution
nodetool ring
# Gossip state
nodetool gossipinfo
AxonOps Cluster Management¶
Managing cluster topology through command-line tools requires careful coordination, monitoring, and expertise. AxonOps provides operational automation that simplifies and safeguards these procedures.
Automated Node Operations¶
AxonOps provides:
- Visual cluster topology: Real-time view of all nodes, their status, and token distribution
- Guided node operations: Step-by-step wizards for adding, removing, and replacing nodes
- Pre-flight validation: Automatic checks before topology changes (disk space, network, version compatibility)
- Progress monitoring: Real-time streaming progress with ETA and throughput metrics
- Automatic cleanup scheduling: Post-addition cleanup orchestrated across nodes
Topology Change Safeguards¶
- Impact analysis: Projected data movement and time estimates before operations
- Rollback guidance: Clear procedures if operations need to be aborted
- Audit logging: Complete history of who performed what topology changes
- Alerting: Notifications for stalled operations or failures
Multi-Datacenter Management¶
- Cross-DC visibility: Unified view of all datacenters
- Coordinated operations: Manage topology changes across datacenters
- Replication monitoring: Verify data consistency across DCs
See the AxonOps documentation for detailed configuration and usage guides.
Troubleshooting¶
Node Won't Join Cluster¶
Symptoms: New node starts but doesn't appear in nodetool status
Checks:
# Verify cluster name matches
grep cluster_name /etc/cassandra/cassandra.yaml
# Check seed connectivity
nc -zv <seed_ip> 7000
# Check gossip
nodetool gossipinfo
# Review logs for errors
tail -100 /var/log/cassandra/system.log | grep -i error
Common causes:
| Cause | Solution |
|---|---|
| Cluster name mismatch | Fix cassandra.yaml, clear data directory |
| Firewall blocking 7000 | Open port 7000 between all nodes |
| Seeds unreachable | Verify seed IPs and connectivity |
| Schema disagreement | Wait for agreement or restart seeds |
Decommission Stuck¶
Symptoms: Node stays in UL (Leaving) state for extended time
# Check streaming progress
nodetool netstats
# If no progress, check for streaming errors
grep -i stream /var/log/cassandra/system.log | tail -50
# Check target nodes are healthy
nodetool status
Resolution:
- If target nodes are overloaded, reduce concurrent streaming
- If network issues, resolve connectivity
- As last resort, stop the node and use
nodetool removenodefrom another node
Streaming Failures¶
Symptoms: Topology operation fails with streaming errors
# Check for failed streams
nodetool netstats | grep -i failed
# Common causes:
# - Network timeouts (increase streaming_socket_timeout_in_ms)
# - Disk space exhaustion
# - Memory pressure
Mitigation:
# cassandra.yaml - increase timeouts
streaming_socket_timeout_in_ms: 86400000 # 24 hours
stream_throughput_outbound_megabits_per_sec: 200 # Reduce if overwhelming
Best Practices¶
Planning Topology Changes¶
- Schedule during low-traffic periods: Streaming competes with client requests
- Monitor throughout: Watch netstats, logs, and metrics
- One operation at a time: Never overlap topology changes
- Have rollback plan: Know how to recover if operation fails
Capacity Management¶
- Maintain headroom: Keep disk utilization below 50% for operational flexibility
- Plan for failures: Size cluster to survive losing a rack
- Regular assessment: Review capacity quarterly
Documentation¶
- Record all changes: Date, operator, reason, outcome
- Maintain runbooks: Step-by-step procedures for common operations
- Post-mortems: Document issues and resolutions
Related Documentation¶
- Repair Operations - Post-topology-change repair
- Backup & Restore - Data protection during changes
- Maintenance - Rolling restarts and cleanup
- Architecture: Gossip - How nodes discover each other
- Architecture: Node Lifecycle - Node states and transitions