Node Lifecycle¶
This section describes the complete lifecycle of a Cassandra node from initial cluster join through eventual removal. Understanding these state transitions is essential for operational management, troubleshooting, and capacity planning.
Lifecycle State Machine¶
Cassandra nodes progress through well-defined states during their lifecycle:
State Definitions¶
| State | Gossip STATUS | Description |
|---|---|---|
| STARTING | - | Node process started, not yet joined cluster |
| JOINING | BOOT |
Bootstrapping, streaming data from existing nodes |
| NORMAL | NORMAL |
Fully operational, serving client requests |
| LEAVING | LEAVING |
Decommissioning, streaming data to remaining nodes |
| MOVING | MOVING |
Token reassignment in progress |
| LEFT | LEFT |
Removed from cluster, no longer participating |
| DOWN | - | Failure detected (local determination, not gossiped) |
Joining the Cluster¶
Bootstrap Process¶
Bootstrap is the process by which a new node joins an existing cluster and receives its share of data.
Bootstrap Configuration¶
# cassandra.yaml - Bootstrap settings
# Enable/disable automatic bootstrap
# Set to false for first node in cluster or when using replace_address
auto_bootstrap: true
# Number of virtual nodes per physical node
num_tokens: 16
# Initial token (only if num_tokens = 1, not recommended)
# initial_token:
# Allocate tokens using random or algorithm-based approach
allocate_tokens_for_keyspace: <keyspace_name> # Optional, for better distribution
Bootstrap Streaming¶
During bootstrap, the new node streams data from existing replicas:
| Aspect | Description |
|---|---|
| Source selection | Prefers local datacenter, least-loaded nodes |
| Range calculation | Based on new node's tokens and RF |
| Parallelism | Concurrent streams from multiple sources |
| Resume capability | Can resume after failures (Cassandra 4.0+) |
Streaming source selection algorithm:
- Identify all ranges new node should own
- For each range, identify replica set
- Select replica based on:
- Prefer same datacenter
- Prefer least-loaded node
- Avoid nodes already streaming
Bootstrap Duration¶
Bootstrap time depends on:
| Factor | Impact |
|---|---|
| Data volume | Linear with total data size |
| Network bandwidth | Limited by stream_throughput_outbound_megabits_per_sec |
| Number of ranges | More ranges = more coordination overhead |
| Compaction during streaming | May delay completion |
Estimation formula:
Time ≈ (Data per node × RF) / Stream throughput
Example: (500GB × 3) / 200Mbps ≈ 1.7 hours
Monitoring Bootstrap¶
# Check bootstrap progress
nodetool netstats
# Sample output during bootstrap:
# Mode: JOINING
# /10.0.1.2
# Receiving 234 files, 45GB total. Already received 156 files, 30GB.
# Check node status
nodetool status
# Shows UJ (Up Joining) during bootstrap
# View detailed streaming
nodetool netstats -H
Normal Operation¶
NORMAL State¶
A node in NORMAL state:
- Owns specific token ranges
- Serves as replica for data within RF
- Accepts client read/write requests
- Participates in gossip protocol
- Reports metrics and health status
Token Ownership¶
Each node owns ranges of the token ring:
Operational Health Indicators¶
| Indicator | Healthy | Warning | Critical |
|---|---|---|---|
| Gossip state | NORMAL | JOINING/MOVING | DOWN |
| Pending compactions | < 20 | 20-100 | > 100 |
| Dropped messages | 0 | < 1% | > 1% |
| GC pause | < 500ms | 500ms-1s | > 1s |
Leaving the Cluster¶
Decommission Process¶
Decommission is the orderly removal of a node, ensuring all data is transferred to remaining nodes.
Decommission Command¶
# Initiate decommission (run on node being removed)
nodetool decommission
# This command:
# - Blocks until complete
# - Cannot be cancelled once started
# - Requires RF nodes remain after removal
Decommission Prerequisites¶
| Requirement | Verification |
|---|---|
| Sufficient remaining capacity | Check disk usage on remaining nodes |
| Replication factor satisfied | N - 1 ≥ RF for all keyspaces |
| No pending repairs | nodetool repair_admin list |
| Gossip healthy | nodetool gossipinfo shows all nodes |
Data Redistribution¶
During decommission, data transfers to new replica owners:
Monitoring Decommission¶
# Check decommission progress
nodetool netstats
# Sample output:
# Mode: LEAVING
# /10.0.1.4
# Sending 456 files, 89GB total. Already sent 234 files, 45GB.
# Node status shows UL (Up Leaving)
nodetool status
Token Movement¶
Move Operation¶
The move operation reassigns a node's token position without removing it from the cluster:
# Move node to new token (single-token nodes only)
nodetool move <new_token>
# Not recommended with vnodes (num_tokens > 1)
Move Limitations
Token movement is generally discouraged with vnodes. For capacity rebalancing, add/remove nodes instead of moving tokens.
Failure Scenarios¶
Node Failure Detection¶
When a node fails, the cluster detects and responds:
Response to Node Failure¶
| Response | Description |
|---|---|
| Hints stored | Coordinator stores writes destined for failed node |
| Reads rerouted | Failed node excluded from read replica selection |
| Writes continue | If CL satisfied by remaining replicas |
| No automatic replacement | Manual intervention required |
Node Recovery¶
When a failed node recovers:
- Gossip reconnection: Node contacts seeds/peers
- State synchronization: Receives current cluster state
- Hint replay: Receives stored hints from other nodes
- Gradual inclusion: Phi detector marks as UP after successful communication
# After node restart, verify recovery
nodetool status # Should show UN (Up Normal)
nodetool gossipinfo # Verify gossip state
nodetool tpstats # Check for hint replay activity
Operational Commands¶
Status Commands¶
# View cluster status
nodetool status
# Status output interpretation:
# UN = Up Normal (healthy)
# UJ = Up Joining (bootstrapping)
# UL = Up Leaving (decommissioning)
# UM = Up Moving (token movement)
# DN = Down Normal (failed)
# Detailed gossip state
nodetool gossipinfo
# Ring ownership
nodetool ring
State Transition Commands¶
| Command | Effect | Use Case |
|---|---|---|
nodetool decommission |
NORMAL → LEAVING → LEFT | Orderly node removal |
nodetool removenode <host_id> |
Force remove dead node | Node permanently failed |
nodetool assassinate <ip> |
Force remove stuck node | Emergency only |
nodetool move <token> |
Change token assignment | Rebalancing (not recommended with vnodes) |
Force Operations¶
Destructive Operations
The following commands can cause data loss if used incorrectly.
# Force remove a dead node (use when node is unrecoverable)
nodetool removenode <host_id>
# Force remove a stuck node (emergency only)
nodetool assassinate <ip_address>
# These commands:
# - Do NOT stream data
# - May require subsequent repair
# - Should be last resort
Best Practices¶
Before Any Topology Change¶
- Verify cluster health:
nodetool statusshows all UN - Check pending operations: No ongoing repairs, bootstraps
- Ensure sufficient capacity: Remaining nodes can handle load
- Backup if critical: Consider snapshots before major changes
During Bootstrap¶
- Schedule during low-traffic periods
- Monitor streaming progress
- Watch for compaction backlog on existing nodes
During Decommission¶
- Verify RF nodes will remain
- Allow sufficient time for completion
- Do not force-kill the process
After Any Change¶
- Verify
nodetool statusshows expected state - Run repair on affected ranges if needed
- Monitor for any performance degradation
Related Documentation¶
- Gossip Protocol - State propagation and failure detection
- Seeds and Discovery - Bootstrap discovery process
- Node Replacement - Handling permanently failed nodes
- Data Streaming - Bootstrap and decommission data transfer