Adding Nodes¶

Adding nodes (bootstrapping) expands cluster capacity by introducing new nodes that automatically receive a portion of existing data.

Overview¶

When a new node joins a Cassandra cluster:

Node contacts seed nodes to learn cluster topology
Gossip propagates the new node information
Existing nodes stream data to the new node
New node becomes operational after bootstrap completes

Prerequisites¶

Hardware Requirements¶

Same or compatible hardware as existing nodes
Sufficient disk space for expected data share
Network connectivity to all existing nodes

Software Requirements¶

Same Cassandra version as existing cluster
Matching JVM version
Consistent configuration (snitch, partitioner)

Network Requirements¶

Port	Purpose
7000	Internode communication (gossip)
7001	Internode communication (SSL)
9042	CQL native transport
7199	JMX monitoring

Configuration¶

cassandra.yaml Settings¶

# Must match existing cluster
cluster_name: 'ProductionCluster'

# Seed nodes (2-3 existing nodes)
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "10.0.1.1,10.0.1.2"

# New node's address
listen_address: 10.0.1.10
rpc_address: 10.0.1.10

# Enable bootstrap (default)
auto_bootstrap: true

# Match existing cluster settings
endpoint_snitch: GossipingPropertyFileSnitch
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

For Multi-Datacenter¶

Configure datacenter and rack in cassandra-rackdc.properties:

dc=dc1
rack=rack1

Bootstrap Procedure¶

Step 1: Verify Cluster Health¶

On any existing node:

nodetool status

All nodes should show UN (Up/Normal).

Step 2: Verify No Ongoing Operations¶

nodetool compactionstats
nodetool netstats

Avoid bootstrapping during heavy compaction or repairs.

Step 3: Start New Node¶

# Clear any existing data (if reusing hardware)
sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
sudo rm -rf /var/lib/cassandra/saved_caches/*

# Start Cassandra
sudo systemctl start cassandra

Step 4: Monitor Bootstrap¶

From any node:

# Watch node status
nodetool status

New node shows as UJ (Up/Joining) during bootstrap.

On new node:

# Monitor streaming progress
nodetool netstats

# Watch logs
tail -f /var/log/cassandra/system.log

Step 5: Verify Completion¶

nodetool status

New node should show UN (Up/Normal).

Step 6: Run Cleanup¶

After bootstrap, run cleanup on existing nodes to remove data that moved to the new node:

# On each existing node (one at a time)
nodetool cleanup

Monitoring Bootstrap¶

Progress Indicators¶

# Streaming sessions
nodetool netstats

# Bootstrap status in logs
grep -i bootstrap /var/log/cassandra/system.log | tail -20

Streaming Throughput¶

# Check current throughput
nodetool getstreamthroughput

# Increase if needed (MB/s)
nodetool setstreamthroughput 400

Time Estimation¶

Total Cluster Data	Expected Duration
< 100 GB	30 min - 1 hour
100 GB - 500 GB	1-4 hours
500 GB - 2 TB	4-12 hours
> 2 TB	12+ hours

Factors affecting duration: - Network bandwidth - Disk I/O speed - Number of existing nodes - Stream throughput settings

Adding Multiple Nodes¶

Sequential Addition (Recommended)¶

# Add nodes one at a time
# 1. Bootstrap node A, wait for UN status
# 2. Bootstrap node B, wait for UN status
# 3. Run cleanup on all original nodes

Concurrent Addition (Advanced)¶

Multiple nodes can bootstrap simultaneously if: - Sufficient network bandwidth - Adequate disk I/O on source nodes - Tokens are manually assigned to prevent overlap

Troubleshooting¶

Bootstrap Fails to Start¶

Check connectivity:

nc -zv seed-node1 7000
nc -zv seed-node1 9042

Check configuration:

grep -E "cluster_name|seeds" /etc/cassandra/cassandra.yaml

Check logs:

grep -i "error\|failed" /var/log/cassandra/system.log | tail -50

Bootstrap Slow or Stalled¶

Check streaming:

nodetool netstats

Increase throughput:

nodetool setstreamthroughput 400

Check source node disk:

# On source nodes
iostat -x 1 5

Bootstrap Fails Mid-Way¶

Clear and retry:

sudo systemctl stop cassandra
sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
sudo systemctl start cassandra

Wrong Token Assignment¶

With vnodes (default), tokens are automatically assigned. If tokens appear wrong:

# Check token distribution
nodetool ring

# May need to decommission and re-add
nodetool decommission
# Then clear data and restart

Post-Bootstrap Tasks¶

1. Run Cleanup¶

On all existing nodes:

nodetool cleanup my_keyspace

2. Update Monitoring¶

Add new node to monitoring systems
Update alerting configuration
Add to backup schedules

3. Update Seed List (Optional)¶

If adding the node as a seed:

# On all nodes, update cassandra.yaml
seeds: "existing-seed1,existing-seed2,new-node"

4. Run Repair¶

After cluster stabilizes:

nodetool repair -pr

Best Practices¶

Practice	Reason
Add one node at a time	Prevents overload
Run during low traffic	Reduces impact
Monitor throughout	Catch issues early
Run cleanup after	Reclaim space
Update documentation	Track cluster changes

Command	Purpose
`nodetool status`	Cluster status
`nodetool netstats`	Streaming status
`nodetool cleanup`	Remove old data
`nodetool ring`	Token distribution
`nodetool decommission`	Remove node

Cluster Management Overview - Cluster operations
Decommission Node - Removing nodes
Replace Dead Node - Replacing failed nodes