Partitioning¶
Partitioning determines how data is distributed across nodes in a Cassandra cluster. Each row is assigned to a node based on its partition key, which is hashed to produce a token value. The token determines which node owns the data.
The Problem with Modulo Hashing¶
Traditional hash-based distribution uses modulo arithmetic:
node = hash(key) % number_of_nodes
This approach fails catastrophically when the cluster size changes:
Before: 3 nodes
hash("user:123") % 3 = 1 → Node 1
hash("user:456") % 3 = 2 → Node 2
hash("user:789") % 3 = 0 → Node 0
After adding 1 node: 4 nodes
hash("user:123") % 4 = 3 → Node 3 ← MOVED
hash("user:456") % 4 = 2 → Node 2 ← Same
hash("user:789") % 4 = 1 → Node 1 ← MOVED
Result: ~75% of data must move when adding one node
In a large cluster, this is operationally catastrophic
Consistent Hashing¶
Consistent hashing, introduced by Karger et al. (Karger, D. et al., 1997, "Consistent Hashing and Random Trees"), solves the data movement problem by mapping both keys and nodes onto a shared circular hash space.
The Hash Ring¶
Instead of using modulo arithmetic, consistent hashing places everything on a circular number line (a "ring") where the maximum value wraps around to zero:
Both nodes and data keys are hashed to positions on this ring. A key is owned by the first node encountered when walking clockwise from the key's position.
Nodes and Keys on the Ring¶
Consider three nodes placed on the ring at positions 1000, 4000, and 7000:
Finding the owner: The key user:123 hashes to position 2500. Walking clockwise from 2500, Node B (at 4000) is the first node encountered. Therefore, Node B owns this key.
Each node owns all keys between itself and the previous node (counter-clockwise):
| Node | Position | Owns Keys in Range |
|---|---|---|
| A | 1000 | 7001 → 1000 (wrapping through 0) |
| B | 4000 | 1001 → 4000 |
| C | 7000 | 4001 → 7000 |
Adding a Node: Minimal Data Movement¶
When Node D joins at position 5500, only keys in the range 4001–5500 move to the new node. All other keys remain on their original nodes:
| Key | Position | Before (3 nodes) | After (4 nodes) | Moved? |
|---|---|---|---|---|
| key at 2500 | 2500 | Node B | Node B | No |
| key at 5000 | 5000 | Node C | Node D | Yes |
| key at 6000 | 6000 | Node C | Node C | No |
| key at 8000 | 8000 | Node A | Node A | No |
Only keys in the range that Node D now owns (4001–5500) move. Everything else stays in place.
Data Movement Comparison¶
| Operation | Modulo Hashing | Consistent Hashing |
|---|---|---|
| Add 1 node to 3-node cluster | ~75% of data moves | ~25% of data moves |
| Add 1 node to 10-node cluster | ~90% of data moves | ~10% of data moves |
| Add 1 node to 100-node cluster | ~99% of data moves | ~1% of data moves |
Formula: With modulo hashing, adding a node moves ~(N-1)/N of data. With consistent hashing, only ~1/N of data moves. The larger the cluster, the greater the advantage.
Partition Key to Token¶
Every row has a partition key that the partitioner hashes to a token:
-- See the token for a partition key
SELECT token(user_id), user_id, name FROM users;
-- Output:
system.token(user_id) | user_id | name
---------------------------+--------------------------------------+-------
-8750879223671532049 | 550e8400-e29b-41d4-a716-446655440000 | Alice
-3485723948573892001 | 6ba7b810-9dad-11d1-80b4-00c04fd430c8 | Bob
2749583759385739583 | 7ba7b810-9dad-11d1-80b4-00c04fd430c8 | Carol
Partitioners¶
The partitioner determines how partition keys are mapped to token values on the ring. The choice of partitioner affects data distribution, query capabilities, and cluster behavior.
Murmur3Partitioner (Default)¶
The default and recommended partitioner since Cassandra 1.2 (CASSANDRA-3772). It uses the MurmurHash3 algorithm (Appleby, A., 2008), a non-cryptographic hash function designed for high performance and excellent distribution properties.
Hash function: MurmurHash3 (128-bit, uses lower 64 bits)
Token range: -2^63 to +2^63 - 1
Distribution: Uniform random distribution
| Characteristic | Description |
|---|---|
| Distribution | Excellent—keys distribute evenly regardless of input patterns |
| Range queries | Not supported across partition keys |
| Hot spots | Rare—even sequential keys distribute randomly |
| Token range | -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 |
How Murmur3 distributes data:
Sequential input keys: Resulting tokens (random distribution):
user:1 → -7,509,452,495,886,106,294
user:2 → 3,248,973,570,287,400,012
user:3 → -1,042,345,729,384,756,129
user:4 → 8,127,364,501,928,374,650
Even though keys are sequential, tokens are randomly distributed
across the entire ring. This prevents hot spots from sequential inserts.
RandomPartitioner (Legacy)¶
The original default partitioner, now superseded by Murmur3Partitioner.
Hash function: MD5
Token range: 0 to 2^127 - 1
Distribution: Uniform random distribution
| Characteristic | Description |
|---|---|
| Distribution | Good—similar to Murmur3 |
| Performance | Slower than Murmur3 (MD5 is cryptographic) |
| Token format | Large integers (harder to read) |
| Status | Legacy—use Murmur3Partitioner for new clusters |
Why Murmur3 replaced Random:
| Aspect | RandomPartitioner | Murmur3Partitioner |
|---|---|---|
| Hash function | MD5 (cryptographic) | MurmurHash3 (non-cryptographic) |
| Performance | Slower | 3-5× faster |
| Token size | 128-bit (large numbers) | 64-bit (readable numbers) |
| Purpose | Cryptographic security | Speed and distribution |
Cryptographic properties of MD5 are unnecessary for partitioning—only uniform distribution matters.
ByteOrderedPartitioner (Special Use Cases)¶
Preserves the byte ordering of partition keys, enabling range scans across partitions.
Hash function: None (uses raw bytes)
Token range: Raw byte sequences
Distribution: Reflects data distribution
| Characteristic | Description |
|---|---|
| Distribution | Reflects input data patterns—can cause severe hot spots |
| Range queries | Supported across partition keys |
| Hot spots | Common—sequential keys go to same node |
| Use cases | Time-series with range scans, ordered data access |
Warning: ByteOrderedPartitioner can cause severe operational problems:
Sequential writes (e.g., time-series with timestamp keys):
2024-01-15T10:00:00 → Node A
2024-01-15T10:00:01 → Node A
2024-01-15T10:00:02 → Node A
2024-01-15T10:00:03 → Node A
...
2024-01-15T11:00:00 → Node A (still!)
All current writes hit ONE node while others sit idle.
This node becomes a bottleneck and may fail under load.
With Murmur3Partitioner:
2024-01-15T10:00:00 → Node C (random)
2024-01-15T10:00:01 → Node A (random)
2024-01-15T10:00:02 → Node B (random)
2024-01-15T10:00:03 → Node C (random)
Writes distribute evenly across all nodes.
When ByteOrderedPartitioner might be appropriate:
- Range scans across partition keys are required (rare in Cassandra data models)
- Data is pre-distributed by key design
- Full understanding of operational implications
Recommendation: Use Murmur3Partitioner. If range scans are needed, model data with clustering columns instead.
Partitioner Configuration¶
The partitioner is set at cluster creation and cannot be changed without rebuilding the cluster.
# cassandra.yaml
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
| Partitioner Class | Use Case |
|---|---|
org.apache.cassandra.dht.Murmur3Partitioner |
Default, recommended for all new clusters |
org.apache.cassandra.dht.RandomPartitioner |
Legacy clusters only |
org.apache.cassandra.dht.ByteOrderedPartitioner |
Special cases requiring range scans |
Partitioner cannot be changed after data is written. Tokens computed by one partitioner are meaningless to another. Changing partitioners requires:
- Create new cluster with desired partitioner
- Migrate all data using
sstableloaderor application-level ETL - Switch applications to new cluster
- Decommission old cluster
Viewing Token Information¶
Check Token Distribution¶
# View token ring
nodetool ring
# View token ownership for a keyspace
nodetool status my_keyspace
# Describe ring for a keyspace
nodetool describering my_keyspace
Find Replicas for a Key¶
# Show which nodes hold replicas for a partition key
nodetool getendpoints my_keyspace my_table 'partition_key_value'
# Output: List of IP addresses that hold this partition
# 10.0.1.1
# 10.0.1.2
# 10.0.1.3
Query Token Values¶
-- Get token for a partition key
SELECT token(user_id) FROM users WHERE user_id = 123;
-- Find rows in a token range (advanced use)
SELECT * FROM users WHERE token(user_id) > -1000000 AND token(user_id) < 1000000;
Virtual Nodes (vnodes)¶
Virtual nodes fundamentally change how Cassandra distributes token ownership. Instead of each physical node owning one contiguous token range, each node owns many small, non-contiguous ranges scattered across the ring.
The Problem with Single Tokens¶
In early Cassandra (pre-1.2), each node owned exactly one token, representing its position on the ring:
Single-token architecture (3 nodes):
Token Ring: -9×10^18 ────────────────────────────────────── +9×10^18
│ │ │ │
Node A Node B Node C (wrap)
│ │ │
Range: -9 to -3 -3 to +3 +3 to +9 (×10^18)
Each node owns exactly 1/3 of the ring (one contiguous range).
This architecture created significant operational challenges:
Problem 1: Adding nodes required manual token calculation
Adding Node D to a 3-node cluster:
Before:
Node A: -9×10^18 to -3×10^18 (33.3%)
Node B: -3×10^18 to +3×10^18 (33.3%)
Node C: +3×10^18 to +9×10^18 (33.3%)
To achieve even distribution, must recalculate ALL tokens:
Node A: -9×10^18 to -4.5×10^18 (25%)
Node B: -4.5×10^18 to 0 (25%)
Node C: 0 to +4.5×10^18 (25%)
Node D: +4.5×10^18 to +9×10^18 (25%)
This required:
1. Manual token calculation for even split
2. Moving data on EVERY existing node
3. Careful coordination to avoid data loss
Problem 2: Streaming bottlenecks
Node B fails in a 6-node cluster:
A ─── B ─── C
│ │
F ─── E ─── D
With single tokens:
- Only Node A and Node C stream data (B's neighbors)
- A and C become bottlenecks
- Other nodes (D, E, F) sit idle during recovery
Recovery time: Limited by slowest of 2 nodes
Problem 3: Uneven hardware utilization
Assigning proportional load to nodes with different capacities required complex token weight calculations that were error-prone and difficult to maintain.
How Virtual Nodes Work¶
Virtual nodes solve these problems by assigning multiple tokens to each physical node:
Virtual node architecture (3 nodes, num_tokens=4):
Token Ring: -9×10^18 ────────────────────────────────────── +9×10^18
A B C A C B A B C A C B
│ │ │ │ │ │ │ │ │ │ │ │
Each letter represents one vnode (token range)
Node A owns tokens at positions: -8, -2, +1, +7 (×10^18)
Node B owns tokens at positions: -6, -1, +4, +8 (×10^18)
Node C owns tokens at positions: -4, +2, +5, +9 (×10^18)
Total: Each node still owns ~33% of data, but in 4 scattered pieces.
History and Origins¶
Virtual nodes were introduced in CASSANDRA-3768, released in Cassandra 1.2 (January 2013). The feature was inspired by Amazon Dynamo's approach to consistent hashing, which used virtual nodes from its inception.
Evolution of defaults:
| Version | Default num_tokens | Token Assignment | Notes |
|---|---|---|---|
| Pre-1.2 | 1 (single token) | Manual calculation | Required token planning spreadsheets |
| 1.2 - 2.x | 256 | Random | Maximized evenness, high overhead |
| 3.0 - 3.x | 256 | Random or allocator | Introduced allocate_tokens_for_keyspace |
| 4.0+ | 16 | Optimized allocator | allocate_tokens_for_local_replication_factor |
| 5.0+ | 16 | Optimized allocator | Further allocator improvements |
Why 256 was chosen initially:
The original num_tokens default of 256 was based on probability theory. With random token assignment, statistical evenness improves with more tokens. Analysis showed:
- 256 tokens achieved ~1% standard deviation in load distribution
- Diminishing returns beyond 256
- Acceptable memory overhead at the time
Why 16 replaced 256:
CASSANDRA-13701 demonstrated that the new token allocator could achieve equivalent or better distribution with far fewer tokens:
| Metric | 256 random tokens | 16 allocated tokens |
|---|---|---|
| Load distribution std dev | ~1% | <1% |
| Gossip state per node | ~8 KB | ~0.5 KB |
| Token metadata memory | High | 16× smaller |
| Repair coordination | Complex | Simpler |
Streaming Benefits in Detail¶
The primary operational advantage of vnodes is parallelized streaming:
Scenario: Replacing a failed node
6-node cluster, RF=3, replacing Node B:
With single tokens (num_tokens=1):
┌─────────────────────────────────────────────────┐
│ Node A ═══════════════════════════════════════> │ New Node
│ Node C ═══════════════════════════════════════> │ (replaces B)
│ Node D (idle) │
│ Node E (idle) │
│ Node F (idle) │
└─────────────────────────────────────────────────┘
Streams from: 2 nodes (B's ring neighbors)
Bottleneck: Slowest of A or C
Time: ~6 hours (example)
With vnodes (num_tokens=16):
┌─────────────────────────────────────────────────┐
│ Node A ════════════════════> │
│ Node C ════════════════════> │ New Node
│ Node D ════════════════════> │ (replaces B)
│ Node E ════════════════════> │
│ Node F ════════════════════> │
└─────────────────────────────────────────────────┘
Streams from: ALL 5 nodes in parallel
Bottleneck: Aggregate cluster throughput
Time: ~1-2 hours (example)
Why this works:
With vnodes, Node B's data is scattered across many token ranges. Each range has different replica sets:
Node B's vnode at token -6×10^18:
Replicas: B, C, D → Stream from C or D
Node B's vnode at token -1×10^18:
Replicas: B, A, F → Stream from A or F
Node B's vnode at token +4×10^18:
Replicas: B, E, A → Stream from E or A
Result: All nodes participate in recovery, not just ring neighbors.
Impact on Repairs¶
Virtual nodes significantly affect repair operations:
Repair coordination complexity:
Single token: Each node has 1 token range to repair
- Simple: repair range -3×10^18 to +3×10^18
Vnodes (num_tokens=16): Each node has 16 token ranges
- Must coordinate repairs across 16 separate ranges
- Each range may have different replica sets
- Subrange repair becomes important
Repair parallelism:
| Repair Type | Single Token | vnodes |
|---|---|---|
| Full repair | Sequential by range | Parallel across ranges |
| Incremental repair | One range per node | Multiple ranges per node |
| Subrange repair | Not needed | Essential for large datasets |
Recommended repair approach with vnodes:
# Cassandra 4.0+: Use -pr (primary range) to avoid redundant work
nodetool repair -pr keyspace_name
# For large clusters: Use subrange repairs
nodetool repair -st START_TOKEN -et END_TOKEN keyspace_name
Impact on Compaction¶
Vnodes affect SSTable organization and compaction behavior:
Single token:
All SSTables contain data from one contiguous range
Compaction is straightforward
Vnodes:
SSTables may contain data from multiple token ranges
Local token boundaries affect SSTable splitting
More token ranges = more potential SSTable fragments
With Cassandra 4.0+, the reduced num_tokens (16 vs 256) significantly decreases this fragmentation.
Token Allocation Algorithms¶
Random Assignment (Legacy)¶
Prior to Cassandra 3.0, tokens were assigned using random number generation:
// Simplified algorithm
for (int i = 0; i < num_tokens; i++) {
tokens.add(randomTokenInRange(MIN_TOKEN, MAX_TOKEN));
}
Problems with random assignment:
Example: 3 nodes, num_tokens=4, random assignment
Ideal: Each node owns 33.3% of token space
Actual (random):
Node A: 28% (unlucky distribution)
Node B: 41% (got clustered tokens)
Node C: 31% (slightly under)
With 256 tokens, variance decreases but never disappears.
Replication-Aware Allocator (Cassandra 3.0+)¶
CASSANDRA-7032 introduced an allocator that considers replica placement:
# cassandra.yaml
allocate_tokens_for_keyspace: my_keyspace
How it works:
- Examine existing token distribution in the cluster
- For each new token, calculate the resulting replica balance
- Choose token positions that optimize overall balance
- Consider rack and datacenter placement
Limitation: Requires an existing keyspace to analyze, problematic for new clusters.
Optimized Allocator (Cassandra 4.0+)¶
CASSANDRA-13701 introduced a new allocator that uses replication factor directly:
# cassandra.yaml
allocate_tokens_for_local_replication_factor: 3
Advantages:
- Works without existing keyspace
- Computes mathematically optimal token placement
- Achieves near-perfect distribution with only 16 tokens
- Considers datacenter-local replication
How the Algorithm Works:
The optimized allocator solves a mathematical optimization problem: given the existing token distribution and the target replication factor, find token positions for the new node that minimize ownership imbalance across all replicas.
Step 1: Define the optimization goal
The allocator aims to equalize "replica-weighted ownership" rather than simple token range ownership. With RF=3, each token range is replicated to 3 nodes. The goal is that each node handles an equal share of: - Primary replicas (data it owns) - Secondary replicas (data it stores as backup)
Ideal state (6 nodes, RF=3):
Each node should own:
- 1/6 of primary ranges
- 1/6 of secondary ranges
- 1/6 of tertiary ranges
Total replicated data per node: (1/6) × 3 = 50% of total data
Step 2: Build a token-to-replica mapping
For each potential token position, the allocator calculates which nodes would hold replicas:
Token position T:
- Primary: Node owning T (clockwise lookup)
- Secondary: Next node clockwise (considering rack)
- Tertiary: Next node clockwise after secondary
The allocator evaluates thousands of candidate positions to find
optimal placements that balance all three replica levels.
Step 3: Account for rack awareness
When racks are configured, replicas must be placed on different racks when possible. The allocator incorporates this constraint:
3 racks, RF=3:
Token T → Primary on Rack1
→ Secondary on Rack2 (skip nodes on Rack1)
→ Tertiary on Rack3 (skip nodes on Rack1, Rack2)
The allocator ensures token placement doesn't create
rack-level hotspots where one rack holds more replicas.
Step 4: Iterative optimization
For a node joining with num_tokens=16, the allocator:
for each token slot (1 to 16):
for each candidate position on the ring:
calculate resulting ownership distribution
calculate replica balance across all nodes
score = ownership_variance + replica_imbalance
select position with lowest score
add token to new node's token set
update ring state for next iteration
Step 5: Minimize future disruption
The allocator also considers how the new token placement affects future node additions. Positions are preferred that: - Leave room for additional nodes to join evenly - Don't create token clusters that would require major rebalancing later
Example: Adding a 4th node to a 3-node cluster
Before (3 nodes, num_tokens=16, RF=3):
Node A: 16 tokens, owns 33.3% primary
Node B: 16 tokens, owns 33.3% primary
Node C: 16 tokens, owns 33.3% primary
Adding Node D with optimized allocator:
1. Allocator analyzes 48 existing tokens (16 × 3)
2. For each of D's 16 tokens, finds position that:
- Takes equal share from A, B, and C
- Balances replica distribution
- Respects rack placement
After:
Node A: 16 tokens, owns 25.0% primary
Node B: 16 tokens, owns 25.0% primary
Node C: 16 tokens, owns 25.0% primary
Node D: 16 tokens, owns 25.0% primary
Data movement: Only ~25% of data moves (to new node)
Replica balance: Each node holds ~75% of total data (RF=3/4 nodes)
Comparison with random allocation:
| Metric | Random (256 tokens) | Optimized (16 tokens) |
|---|---|---|
| Ownership std dev | 1-3% | <0.5% |
| Replica balance | Variable | Near-perfect |
| Token count | 256 | 16 |
| Memory overhead | High | Low |
| Gossip size | ~8KB/node | ~0.5KB/node |
Multi-datacenter behavior:
When allocate_tokens_for_local_replication_factor is set, the allocator optimizes within each datacenter independently:
# Node in dc1 with RF=3 in dc1, RF=2 in dc2
allocate_tokens_for_local_replication_factor: 3
# The allocator only considers dc1's replication factor
# dc2 replication is handled by dc2's nodes
Each datacenter maintains its own balanced token distribution, with cross-DC replication handled by NetworkTopologyStrategy at the keyspace level.
Configuration Reference¶
# cassandra.yaml - Complete vnode configuration
# Number of tokens per node
# Cassandra 4.0+ recommendation: 16
# Legacy clusters: 256
num_tokens: 16
# Token allocator (choose one, Cassandra 4.0+)
# Option 1: Use replication factor directly (recommended)
allocate_tokens_for_local_replication_factor: 3
# Option 2: Reference existing keyspace (Cassandra 3.0+)
# allocate_tokens_for_keyspace: my_keyspace
# Initial token (only for num_tokens: 1)
# Used for manual single-token assignment
# initial_token: -3074457345618258602
Viewing Token Information¶
List all tokens for a node:
nodetool info
# Shows: Token(s): [list of tokens]
nodetool ring
# Shows all tokens in the cluster with ownership percentages
Example nodetool ring output:
Datacenter: dc1
=================
Address Rack Status State Load Owns Token
-9223372036854775808
10.0.1.1 rack1 Up Normal 15.2 GiB 33.3% -9223372036854775808
10.0.1.1 rack1 Up Normal 15.2 GiB 33.3% -6148914691236517206
10.0.1.1 rack1 Up Normal 15.2 GiB 33.3% -3074457345618258604
10.0.1.2 rack2 Up Normal 14.8 GiB 33.3% -2
10.0.1.2 rack2 Up Normal 14.8 GiB 33.3% 3074457345618258600
...
Check token distribution balance:
nodetool describering keyspace_name
# Shows token ranges and their endpoints
nodetool status keyspace_name
# Shows ownership percentage per node (should be ~equal)
Recommended Configuration by Scenario¶
| Scenario | num_tokens | Allocator Setting | Rationale |
|---|---|---|---|
| New cluster, Cassandra 4.0+ | 16 | allocate_tokens_for_local_replication_factor: 3 |
Optimal balance with minimal overhead |
| New cluster, Cassandra 3.x | 32 | allocate_tokens_for_keyspace: system_auth |
Good distribution without 4.0 allocator |
| Existing cluster, upgrading | Keep current | Keep current | Changing requires node replacement |
| Single-token (advanced) | 1 | Set initial_token manually |
Maximum control, complex operations |
| Heterogeneous hardware | Proportional | Per-node configuration | More tokens on larger nodes |
| Very large cluster (500+) | 4-8 | allocate_tokens_for_local_replication_factor |
Reduce coordination overhead |
Heterogeneous Hardware Configuration¶
For clusters with nodes of varying capacity, assign more vnodes to more powerful nodes:
# Large node (2× capacity): cassandra.yaml
num_tokens: 32
# Standard node: cassandra.yaml
num_tokens: 16
# Result: Large node owns ~2× the data
Considerations:
- All nodes should use the same allocator setting
- Ratio should reflect actual capacity difference
- Monitor load distribution after deployment
- Repair and streaming times will differ by node
Migration and Operational Considerations¶
Changing num_tokens on existing nodes:
Changing num_tokens after a node has joined the cluster has no effect. The tokens are assigned at bootstrap time and persist in the system tables.
Strategies for changing token configuration:
| Strategy | Process | Downtime | Risk |
|---|---|---|---|
| Rolling replacement | Replace nodes one by one with new config | None | Low |
| Parallel replacement | Replace multiple nodes simultaneously | Partial | Medium |
| New cluster migration | Build new cluster, migrate data | Application switch | Low |
Rolling replacement process:
# 1. Decommission old node (streams data out)
nodetool decommission
# 2. Bootstrap new node with new num_tokens
# (new cassandra.yaml with updated num_tokens)
# Node joins and streams data in
# 3. Repeat for each node in the cluster
Troubleshooting¶
Uneven load distribution:
# Check ownership percentages
nodetool status my_keyspace
# Expected: All nodes within 5% of (100/num_nodes)%
# Problem: One node showing 45% when expected 33%
Causes and solutions:
| Symptom | Cause | Solution |
|---|---|---|
| Large ownership variance | Random token assignment | Replace nodes with allocator enabled |
| One node much higher | Hot partition | Redesign partition key |
| Consistent skew | Token misconfiguration | Check allocator settings |
Token collision (rare):
If two nodes are assigned the same token:
# Check for duplicate tokens
nodetool ring | sort -t: -k2 | uniq -d
# Resolution: Replace one node with new token assignment
Single Token vs vnodes Comparison¶
| Aspect | Single Token (num_tokens=1) | vnodes (num_tokens=16+) |
|---|---|---|
| Token assignment | Manual calculation required | Automatic |
| Adding nodes | Must recalculate all tokens | Just bootstrap |
| Removing nodes | Must recalculate all tokens | Just decommission |
| Streaming parallelism | Limited to ring neighbors | All nodes participate |
| Recovery time | Slower (2-node bottleneck) | Faster (parallel) |
| Repair complexity | Simple (one range) | Complex (many ranges) |
| Memory overhead | Minimal | ~1KB per 16 tokens per node |
| Gossip overhead | Minimal | Proportional to num_tokens |
| Operational expertise | High (token math) | Low (automated) |
When single token might still be preferred:
- Extremely large clusters (1000+ nodes) where gossip overhead matters
- Specialized workloads requiring precise token control
- Organizations with deep Cassandra expertise and automation
Related Documentation¶
- Distributed Data Overview - How partitioning, replication, and consistency work together
- Replication - How partitions are copied across nodes
- Data Modeling - Designing partition keys for even distribution