Skip to content

Cluster Management Operations

This guide covers operational procedures for managing Cassandra cluster topology: adding capacity, removing nodes, replacing failed hardware, and maintaining cluster health.

No Single Point of Failure

Cassandra's peer-to-peer architecture enables online topology changes. Every node knows the cluster state through gossip, and token ranges determine data ownership. Topology changes trigger automatic data streaming in the background while the cluster continues serving traffic.


Understanding Topology Changes

How Cassandra Handles Topology Changes

When cluster membership changes, Cassandra automatically redistributes data:

uml diagram

Token Ownership and Streaming

Scenario Data Movement Cleanup Required
Add node Existing nodes stream data to new node Yes, on existing nodes
Remove node (decommission) Departing node streams to remaining nodes No
Remove node (dead) Remaining nodes stream among themselves No
Replace node Remaining nodes stream to replacement No

Always Run Cleanup After Adding Nodes

After adding nodes, existing nodes retain copies of data they no longer own. Run nodetool cleanup on each existing node to reclaim disk space. Schedule this during low-traffic periods as cleanup involves reading and rewriting SSTables.


Cluster Operations Overview

Operation When to Use Command/Method Impact
Add node Scaling capacity Start new node with proper config Streaming ~hours
Decommission Graceful removal (node up) nodetool decommission Streaming ~hours
Remove node Forced removal (node down) nodetool removenode Streaming ~hours
Replace node Hardware replacement replace_address_first_boot Streaming ~hours
Move token Rebalancing (rarely needed) nodetool move Streaming
Cleanup After adding nodes nodetool cleanup I/O intensive
Assassinate Last resort (stuck node) nodetool assassinate Immediate

Adding Nodes

Pre-flight Checklist

Before adding a node, verify:

  • [ ] Same Cassandra version as existing cluster
  • [ ] Same JDK version and vendor
  • [ ] Adequate disk space (check existing node usage)
  • [ ] Network connectivity to all existing nodes (storage port 7000, native port 9042)
  • [ ] Firewall rules configured
  • [ ] NTP synchronized across all nodes
  • [ ] Seed nodes identified (use 2-3 existing stable nodes)

Adding a Single Node

Step 1: Prepare the new node

# Install Cassandra (same version as cluster)
# Configure cassandra.yaml

cluster_name: 'Production'           # Must match existing cluster
num_tokens: 256                      # Match existing nodes (or 16 for vnodes)
seeds: "10.0.1.1,10.0.1.2"          # 2-3 existing nodes, NOT the new node
listen_address: 10.0.1.10           # This node's IP
rpc_address: 10.0.1.10              # Or 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: true                 # Default, ensures data streaming

Step 2: Configure rack and datacenter

# /etc/cassandra/cassandra-rackdc.properties
dc=dc1
rack=rack1

Step 3: Start the node

sudo systemctl start cassandra

# Monitor bootstrap progress
watch -n 5 'nodetool netstats | head -30'

# Check joining status
nodetool status
# New node shows UJ (Up, Joining) during bootstrap

Step 4: Verify bootstrap completion

# Node should show UN (Up, Normal)
nodetool status

# Verify data distribution
nodetool ring | grep <new_node_ip>

# Check no streaming in progress
nodetool netstats

Step 5: Run cleanup on existing nodes

# On EACH existing node (not the new one)
# Schedule during low-traffic period
nodetool cleanup

# Monitor progress
nodetool compactionstats

Adding Multiple Nodes

Never Bootstrap Multiple Nodes Simultaneously

Adding multiple nodes at once can overwhelm the cluster with streaming operations. Add nodes sequentially, waiting for each to complete bootstrap before starting the next.

Sequential addition procedure:

# Add first node
# Wait for bootstrap complete (UN status)
# Add second node
# Wait for bootstrap complete
# ... repeat ...
# Run cleanup on ALL original nodes after all additions complete

Recommended timing:

Cluster Size Wait Between Additions
< 10 nodes 1-2 hours
10-50 nodes 2-4 hours
50+ nodes 4-8 hours or overnight

Bootstrap Performance Tuning

For faster bootstrap (at cost of cluster performance):

# cassandra.yaml on NEW node only
stream_throughput_outbound_megabits_per_sec: 400  # Default 200
stream_entire_sstables: true                       # Cassandra 4.0+
# On existing nodes, temporarily increase streaming
nodetool setstreamthroughput 400

Troubleshooting Bootstrap

Bootstrap stuck or slow:

# Check streaming status
nodetool netstats

# Check for errors in logs
tail -f /var/log/cassandra/system.log | grep -i stream

# If stuck, check network connectivity
nc -zv <seed_node> 7000

Bootstrap failed:

# Clear data and retry
sudo systemctl stop cassandra
rm -rf /var/lib/cassandra/data/*
rm -rf /var/lib/cassandra/commitlog/*
rm -rf /var/lib/cassandra/saved_caches/*
sudo systemctl start cassandra

Removing Nodes

Graceful Decommission (Node is Healthy)

Use decommission when the node is up and responsive:

# On the node being removed
nodetool decommission

# Monitor streaming progress (from another node)
nodetool netstats
watch 'nodetool status'

# Node transitions: UN → UL (Leaving) → removed from ring

Decommission takes time based on data volume:

Data per Node Approximate Time
100 GB 1-2 hours
500 GB 4-8 hours
1 TB+ 8-24 hours

Cannot Cancel Decommission

Once started, decommission cannot be safely cancelled. The node will stream all its data before leaving the ring. Plan accordingly.

Forced Removal (Node is Down)

When a node is unrecoverable and cannot be decommissioned:

# Get the host ID of the dead node
nodetool status
# Note the host ID (UUID) of the DN (Down, Normal) node

# Remove from any surviving node
nodetool removenode <host_id>

# Monitor progress
nodetool removenode status

# If removenode hangs (>1 hour with no progress)
nodetool removenode force <host_id>

When to use force:

  • removenode stuck for extended period
  • Other nodes show repeated connection failures to dead node
  • Streaming progress at 0% for >30 minutes

Assassinate (Last Resort)

Use only when removenode fails completely:

# Only if removenode force doesn't work
nodetool assassinate <node_ip>

Assassinate Risks

Assassinate immediately removes the node from gossip without data redistribution. Use only when the node's data is recoverable through repair from other replicas.

Post-Removal Verification

# Verify node removed
nodetool status
# Dead node should not appear

# Check cluster health
nodetool describecluster

# Run full repair to ensure data consistency
nodetool repair -full

Replacing Nodes

Node replacement is used when hardware fails but the IP address or host identity needs to be preserved (or when replacing with new hardware).

Replace with Same IP Address

Step 1: Ensure dead node is recognized as down

# From any live node
nodetool status
# Dead node should show DN (Down, Normal)

Step 2: Prepare replacement node

# Same configuration as dead node
# cassandra.yaml: same settings

# Add JVM option for replacement
# In jvm.options or jvm-server.options (Cassandra 4.0+)
-Dcassandra.replace_address_first_boot=<dead_node_ip>

Step 3: Start replacement

sudo systemctl start cassandra

# Monitor replacement streaming
nodetool netstats

Step 4: Remove JVM option and restart

# After replacement complete (node shows UN)
# Remove the replace_address_first_boot option
# Restart is optional but recommended

sudo systemctl restart cassandra

Replace with Different IP Address

# Use replace_address_first_boot with the OLD (dead) node's IP
-Dcassandra.replace_address_first_boot=<dead_node_ip>

# Configure cassandra.yaml with NEW IP
listen_address: <new_node_ip>
rpc_address: <new_node_ip>

Replace with Different Host ID

Starting Cassandra 4.0, use host ID instead of IP:

# Get dead node's host ID
nodetool status

# On new node
-Dcassandra.replace_address_first_boot=<dead_node_host_id>

Replacement vs Removenode + Bootstrap

Approach Pros Cons
Replace Faster (streams from replicas) Must match token count
Remove + Add Clean slate Two streaming operations

Use replacement when:

  • Hardware failure with recoverable scenario
  • Same token configuration
  • Need faster recovery

Use remove + add when:

  • Changing token configuration
  • Datacenter restructuring
  • Clean start preferred

Scaling Operations

Scaling Up (Adding Capacity)

Calculate nodes needed:

Current: 6 nodes, 80% disk utilization
Target: 50% utilization for growth headroom

New node count = 6 × (80/50) = 9.6 → 10 nodes
Add 4 nodes

Scaling procedure:

  1. Add nodes one at a time (see Adding Nodes)
  2. Wait for each bootstrap to complete
  3. Run cleanup on original nodes after all additions
  4. Verify data distribution with nodetool status

Scaling Down (Reducing Capacity)

Verify data fits on remaining nodes:

# Check current disk usage
nodetool status

# Ensure remaining nodes have capacity
# Rule: Post-removal utilization < 70%

Scale-down procedure:

  1. Decommission nodes one at a time
  2. Wait for each decommission to complete
  3. Verify data integrity after each removal

Replication Factor Constraint

The remaining cluster must have at least RF nodes per datacenter. Removing below RF makes writes impossible at QUORUM consistency.


Multi-Datacenter Operations

Adding a New Datacenter

Step 1: Configure replication for existing keyspaces

-- Update keyspace to include new DC (before adding nodes)
ALTER KEYSPACE my_keyspace WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3  -- New datacenter
};

Step 2: Add nodes to new datacenter

# cassandra-rackdc.properties on new nodes
dc=dc2
rack=rack1

# cassandra.yaml - use seeds from BOTH datacenters
seeds: "dc1-node1,dc1-node2,dc2-node1"

Step 3: Rebuild data in new datacenter

# On EACH new node in dc2
nodetool rebuild dc1

# This streams data from dc1 to populate dc2

Removing a Datacenter

Step 1: Update replication to exclude DC

ALTER KEYSPACE my_keyspace WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3
    -- dc2 removed
};

-- For ALL keyspaces including system_auth, system_distributed

Step 2: Decommission all nodes in the DC

# Decommission each node in dc2
nodetool decommission

Monitoring Topology Operations

Key Metrics During Topology Changes

Metric Source Alert Threshold
Streaming progress nodetool netstats Stalled >30 min
Pending compactions nodetool compactionstats >100 sustained
Heap usage JMX >80%
Disk usage OS >85%

Streaming Progress Monitoring

# Detailed streaming status
nodetool netstats

# Watch for progress
watch -n 30 'nodetool netstats | grep -A 20 "Streaming"'

# Estimate completion
# Progress shown as bytes transferred / total bytes

Health Verification Commands

# Cluster membership
nodetool status

# Schema agreement (should show single version)
nodetool describecluster

# Token distribution
nodetool ring

# Gossip state
nodetool gossipinfo

AxonOps Cluster Management

Managing cluster topology through command-line tools requires careful coordination, monitoring, and expertise. AxonOps provides operational automation that simplifies and safeguards these procedures.

Automated Node Operations

AxonOps provides:

  • Visual cluster topology: Real-time view of all nodes, their status, and token distribution
  • Guided node operations: Step-by-step wizards for adding, removing, and replacing nodes
  • Pre-flight validation: Automatic checks before topology changes (disk space, network, version compatibility)
  • Progress monitoring: Real-time streaming progress with ETA and throughput metrics
  • Automatic cleanup scheduling: Post-addition cleanup orchestrated across nodes

Topology Change Safeguards

  • Impact analysis: Projected data movement and time estimates before operations
  • Rollback guidance: Clear procedures if operations need to be aborted
  • Audit logging: Complete history of who performed what topology changes
  • Alerting: Notifications for stalled operations or failures

Multi-Datacenter Management

  • Cross-DC visibility: Unified view of all datacenters
  • Coordinated operations: Manage topology changes across datacenters
  • Replication monitoring: Verify data consistency across DCs

See the AxonOps documentation for detailed configuration and usage guides.


Troubleshooting

Node Won't Join Cluster

Symptoms: New node starts but doesn't appear in nodetool status

Checks:

# Verify cluster name matches
grep cluster_name /etc/cassandra/cassandra.yaml

# Check seed connectivity
nc -zv <seed_ip> 7000

# Check gossip
nodetool gossipinfo

# Review logs for errors
tail -100 /var/log/cassandra/system.log | grep -i error

Common causes:

Cause Solution
Cluster name mismatch Fix cassandra.yaml, clear data directory
Firewall blocking 7000 Open port 7000 between all nodes
Seeds unreachable Verify seed IPs and connectivity
Schema disagreement Wait for agreement or restart seeds

Decommission Stuck

Symptoms: Node stays in UL (Leaving) state for extended time

# Check streaming progress
nodetool netstats

# If no progress, check for streaming errors
grep -i stream /var/log/cassandra/system.log | tail -50

# Check target nodes are healthy
nodetool status

Resolution:

  • If target nodes are overloaded, reduce concurrent streaming
  • If network issues, resolve connectivity
  • As last resort, stop the node and use nodetool removenode from another node

Streaming Failures

Symptoms: Topology operation fails with streaming errors

# Check for failed streams
nodetool netstats | grep -i failed

# Common causes:
# - Network timeouts (increase streaming_socket_timeout_in_ms)
# - Disk space exhaustion
# - Memory pressure

Mitigation:

# cassandra.yaml - increase timeouts
streaming_socket_timeout_in_ms: 86400000  # 24 hours
stream_throughput_outbound_megabits_per_sec: 200  # Reduce if overwhelming

Best Practices

Planning Topology Changes

  1. Schedule during low-traffic periods: Streaming competes with client requests
  2. Monitor throughout: Watch netstats, logs, and metrics
  3. One operation at a time: Never overlap topology changes
  4. Have rollback plan: Know how to recover if operation fails

Capacity Management

  1. Maintain headroom: Keep disk utilization below 50% for operational flexibility
  2. Plan for failures: Size cluster to survive losing a rack
  3. Regular assessment: Review capacity quarterly

Documentation

  1. Record all changes: Date, operator, reason, outcome
  2. Maintain runbooks: Step-by-step procedures for common operations
  3. Post-mortems: Document issues and resolutions