nodetool rangekeysample¶

Displays a sample of partition keys from each token range owned by the node.

Synopsis¶

nodetool [connection_options] rangekeysample

Description¶

nodetool rangekeysample returns a sample of partition keys from each token range owned by the local node. The samples are obtained from the SSTable index files, providing a quick way to see representative partition keys without scanning all data.

How It Works¶

Cassandra maintains partition key samples in memory for each SSTable, derived from the SSTable index. These samples are used internally for:

Estimating partition counts
Calculating data distribution statistics
Optimizing read operations

The rangekeysample command exposes these samples, showing actual partition key values that exist in each token range on the node.

What the Output Represents¶

Each line in the output represents a sampled partition key. The keys shown are:

Actual partition keys from SSTables on the local node
Distributed across token ranges the node owns
A statistical sample, not an exhaustive list
Representative of data distribution patterns

Output Format¶

<partition_key_1>
<partition_key_2>
<partition_key_3>
...

Example Output¶

For a table with UUID partition keys:

550e8400-e29b-41d4-a716-446655440000
6ba7b810-9dad-11d1-80b4-00c04fd430c8
6ba7b811-9dad-11d1-80b4-00c04fd430c8
7c9e6679-7425-40de-944b-e07fc1f90ae7
...

For a table with text partition keys:

user_12345
user_23456
user_34567
order_98765
order_87654
...

For composite partition keys, the output shows the combined key representation.

Arguments¶

This command takes no arguments. It samples keys from all keyspaces and tables on the node.

Examples¶

Basic Usage¶

nodetool rangekeysample

Save Samples to File¶

nodetool rangekeysample > /tmp/key_samples.txt

Count Sample Size¶

nodetool rangekeysample | wc -l

View First 20 Samples¶

nodetool rangekeysample | head -20

Filter for Specific Key Patterns¶

# Find samples matching a pattern
nodetool rangekeysample | grep "user_"

# Find samples starting with specific prefix
nodetool rangekeysample | grep "^order"

Use Cases¶

Investigating Data Distribution¶

Examine what partition keys exist on a specific node to understand data placement:

# Sample keys on each node to compare distribution
for node in node1 node2 node3; do
    echo "=== $node ==="
    ssh "$node" "nodetool rangekeysample" | wc -l
done

Uneven sample counts may indicate data skew or hot spots.

Identifying Partition Key Patterns¶

Discover what types of partition keys exist in the cluster:

# Get unique prefixes to understand key naming patterns
nodetool rangekeysample | cut -c1-10 | sort | uniq -c | sort -rn | head -20

Validating Data After Migration¶

After migrating data, verify that expected partition keys are present:

# Check if specific key patterns exist
nodetool rangekeysample | grep -c "expected_prefix"

Debugging Hot Partitions¶

When investigating potential hot partitions, sample keys to identify candidates:

# Sample keys and cross-reference with known hot partition patterns
nodetool rangekeysample > samples.txt
# Compare with application logs showing slow queries

Estimating Partition Count¶

While not exact, the sample count gives a rough indication of partition density:

# Samples per node
nodetool rangekeysample | wc -l
# Higher counts suggest more partitions

Pre-Migration Analysis¶

Before cluster migration or expansion, understand current key distribution:

# Document current key samples for comparison after migration
nodetool rangekeysample > pre_migration_samples_$(hostname).txt

Understanding the Sample¶

Sample Size¶

The number of keys returned depends on:

Total partitions on the node
SSTable count per table
Sampling interval configured in Cassandra (default samples every 128th key)
Index entries in each SSTable

Sampling Rate¶

Cassandra's SSTable index sampling interval is configured in cassandra.yaml:

# Default: sample 1 key per 128 partitions
index_summary_resize_interval_in_minutes: 60
index_summary_capacity_in_mb: 0  # Auto-calculated based on heap

Interpreting Results¶

Observation	Possible Meaning
Few samples	Node has few partitions or few SSTables
Many samples	Node stores many partitions
Patterns in keys	Application key design visible
No output	Node may have no data or SSTables

Limitations¶

Important Considerations

Not exhaustive - Only returns sampled keys, not all partition keys
Local node only - Shows keys from the node where command is run
All tables combined - Cannot filter by keyspace or table
Point-in-time - Represents data at execution time
No token information - Does not show which token range each key belongs to
Memory-based - Samples are from index summaries held in memory

Getting Complete Key Lists¶

For exhaustive partition key lists (not just samples), use CQL:

-- Warning: This can be expensive on large tables
SELECT DISTINCT token(partition_key), partition_key
FROM keyspace.table;

Or use sstablekeys tool for offline analysis:

# List all keys in an SSTable
sstablekeys /var/lib/cassandra/data/keyspace/table-uuid/nb-1-big-Data.db

Combining with Other Commands¶

With Token Ring Information¶

# Compare key samples with token ranges
echo "=== Token Ranges ==="
nodetool ring | head -20

echo ""
echo "=== Key Samples ==="
nodetool rangekeysample | head -20

With Table Statistics¶

# Correlate samples with partition counts
echo "=== Estimated Partitions ==="
nodetool tablestats my_keyspace.my_table | grep "Number of partitions"

echo ""
echo "=== Sample Count ==="
nodetool rangekeysample | wc -l

Across All Nodes¶

#!/bin/bash
# collect_key_samples.sh - Gather samples from all nodes

OUTPUT_DIR="/tmp/key_samples_$(date +%Y%m%d)"
mkdir -p $OUTPUT_DIR

# Get list of node IPs from local nodetool status
nodes=$(nodetool status | grep "^UN" | awk '{print $2}')

for node in $nodes; do
    echo "Collecting from $node..."
    ssh "$node" "nodetool rangekeysample" > "$OUTPUT_DIR/samples_$node.txt"
    count=$(wc -l < "$OUTPUT_DIR/samples_$node.txt")
    echo "  $count samples collected"
done

echo ""
echo "Samples saved to $OUTPUT_DIR"

# Summary
echo ""
echo "=== Sample Counts by Node ==="
wc -l $OUTPUT_DIR/samples_*.txt

Troubleshooting¶

Empty Output¶

If the command returns no output:

# Check if node has data
nodetool tablestats | grep "Space used"

# Check if SSTables exist
ls /var/lib/cassandra/data/*/*/*.db | head

# Node may need compaction to generate index summaries
nodetool compactionstats

Very Few Samples¶

Few samples may indicate:

Low partition count
Few SSTables (data mostly in memtables)
Recent node with limited data

# Force flush to create SSTables
nodetool flush

# Then re-sample
nodetool rangekeysample | wc -l

Command Hangs¶

If the command takes too long:

# May indicate memory pressure or large index summaries
# Check JMX connectivity
nodetool info

# Check for memory issues
nodetool gcstats

Best Practices¶

Usage Guidelines

Use for exploration - Helpful for understanding data, not production monitoring
Combine with other tools - Cross-reference with ring, tablestats, getendpoints
Sample all nodes - For complete picture, gather from entire cluster
Consider timing - Run after compaction for most accurate representation
Save for comparison - Store samples before and after major changes

Command	Relationship
ring	View token ring and ownership
getendpoints	Find which nodes store a specific key
describering	Detailed ring information
tablestats	Table statistics including partition estimates
status	Cluster status and data load per node