nodetool profileload¶
Profiles read and write operations on a table for a specified duration to identify hot partitions and access patterns.
Synopsis¶
nodetool [connection_options] profileload <keyspace> <table> <duration>
Description¶
nodetool profileload samples operations on a specific table for the given duration and produces a report showing the most frequently accessed partitions. This is essential for identifying hot partitions—partitions that receive disproportionately high traffic compared to others.
Why Hot Partitions Matter¶
In Cassandra, data is distributed across nodes based on partition keys. Ideally, traffic should be evenly distributed across all partitions and nodes. Hot partitions cause problems:
| Problem | Impact |
|---|---|
| Uneven load | One node handles disproportionate traffic while others are idle |
| Latency spikes | Hot partition queries queue up, increasing response times |
| Resource exhaustion | CPU, memory, or disk I/O saturated on specific nodes |
| Scaling limitations | Adding nodes doesn't help if one partition is the bottleneck |
| Compaction pressure | Frequent updates to hot partitions increase compaction load |
How Profiling Works¶
When profileload runs:
- Sampling begins - Cassandra starts tracking operations on the specified table
- Operations recorded - Each read and write operation's partition key is sampled
- Duration elapses - Sampling continues for the specified time
- Report generated - Statistics compiled and displayed showing top accessed partitions
The profiling uses sampling (not exhaustive tracking) to minimize overhead.
Arguments¶
| Argument | Description |
|---|---|
keyspace |
Target keyspace name |
table |
Target table name |
duration |
Sampling duration in milliseconds |
Output Format¶
Table: my_keyspace.my_table
Samples: 10000
Duration: 60000 ms
Top 10 Partitions:
Partition Key Reads Writes Total
user_12345 4523 1245 5768
user_98765 3211 892 4103
user_55555 2105 445 2550
order_2024_01_15 1892 234 2126
user_00001 1456 312 1768
...
Read/Write Ratio: 78% reads, 22% writes
Output Fields¶
| Field | Description |
|---|---|
| Partition Key | The partition key value receiving traffic |
| Reads | Number of read operations sampled for this partition |
| Writes | Number of write operations sampled for this partition |
| Total | Combined read and write count |
Examples¶
Basic 30-Second Profile¶
nodetool profileload my_keyspace users 30000
2-Minute Profile for Detailed Analysis¶
nodetool profileload my_keyspace events 120000
Profile During Peak Hours¶
# Run during known high-traffic period
nodetool profileload ecommerce orders 300000 # 5 minutes
Profile on Remote Node¶
ssh 192.168.1.100 "nodetool profileload my_keyspace my_table 60000"
Save Output for Analysis¶
nodetool profileload my_keyspace my_table 60000 > profile_report.txt
Use Case Scenarios¶
Scenario 1: Investigating Latency Spikes¶
Symptom: Application reports intermittent slow queries on the users table.
Investigation:
# Profile during a period when latency issues occur
nodetool profileload myapp users 60000
Analysis: If output shows one partition with 80% of traffic, that's the hot partition causing queuing and latency.
Resolution options: - Redesign partition key to distribute load - Add caching layer for hot data - Consider time-bucketing for time-series data
Scenario 2: Capacity Planning¶
Symptom: Planning to scale out, need to understand current access patterns.
Investigation:
# Profile each major table
nodetool profileload myapp users 120000
nodetool profileload myapp orders 120000
nodetool profileload myapp events 120000
Analysis: Identify which tables have even distribution vs. hot spots. Tables with hot partitions won't benefit from horizontal scaling without schema changes.
Scenario 3: Validating Data Model Design¶
Symptom: New table deployed, want to verify access patterns match expectations.
Investigation:
# Profile during normal operation
nodetool profileload myapp new_table 300000
Analysis: Compare actual access patterns with design assumptions. If certain partitions are hotter than expected, the data model may need adjustment.
Scenario 4: Debugging Uneven Node Load¶
Symptom: nodetool status shows one node with significantly higher load.
Investigation:
# Profile on the hot node
ssh hot_node "nodetool profileload myapp main_table 60000"
# Compare with other nodes
ssh other_node "nodetool profileload myapp main_table 60000"
Analysis: Determine if specific partition keys are causing the imbalance. Hot partitions always land on the same node(s) based on the token ring.
Scenario 5: Time-Series Data Evaluation¶
Symptom: Time-series table using date as partition key, suspecting today's partition is overwhelmed.
Investigation:
nodetool profileload metrics sensor_readings 60000
Analysis: If output shows 2024-01-15 (today) with 95% of traffic, the current day's partition is hot. Consider time-bucketing (hourly partitions) or a different partitioning strategy.
Scenario 6: Pre-Production Load Testing Validation¶
Symptom: Running load tests, need to verify traffic distribution.
Investigation:
# During load test
nodetool profileload testks test_table 60000
Analysis: Verify load test is generating realistic, distributed traffic patterns. Uneven distribution in testing means the load test isn't representative.
Impact on Cluster Performance¶
Overhead Assessment¶
| Aspect | Impact |
|---|---|
| CPU | Minimal - sampling-based, not exhaustive |
| Memory | Low - maintains counters for sampled partitions |
| Disk I/O | None - operates in memory only |
| Query latency | Negligible - sampling adds microseconds per operation |
| Network | None - local operation only |
Safe for Production
profileload is designed to be safe for production use. The sampling approach ensures minimal overhead even on high-traffic tables. However, avoid running multiple concurrent profiles on the same table.
Recommendations¶
| Scenario | Duration | Notes |
|---|---|---|
| Quick check | 30 seconds | Sufficient for high-traffic tables |
| Standard analysis | 1-2 minutes | Good balance of data and time |
| Comprehensive profiling | 5 minutes | For detailed analysis or low-traffic tables |
| Peak period analysis | Match peak duration | Capture full peak behavior |
Interpreting Results¶
Healthy Distribution¶
Top 10 Partitions:
Partition Key Reads Writes Total
user_12345 234 45 279
user_98765 228 52 280
user_55555 241 38 279
user_44444 225 51 276
user_33333 239 42 281
Interpretation: Top partitions have similar access counts. This indicates even distribution—good data model design.
Hot Partition Detected¶
Top 10 Partitions:
Partition Key Reads Writes Total
user_admin 45230 12450 57680 ← Hot!
user_98765 321 89 410
user_55555 210 44 254
user_44444 189 31 220
Interpretation: user_admin receives 100x more traffic than others. This is a severe hot partition requiring immediate attention.
Write-Heavy Partition¶
Top 10 Partitions:
Partition Key Reads Writes Total
metrics_2024_01_15 234 89450 89684 ← Write-heavy
metrics_2024_01_14 12000 45 12045
Interpretation: Today's partition is receiving heavy writes while yesterday's is read-only. Common in time-series data—consider finer time bucketing.
Addressing Hot Partitions¶
Once identified, hot partitions can be addressed through several strategies:
Data Model Changes¶
-- Before: Single partition per user
CREATE TABLE user_events (
user_id text,
event_time timestamp,
event_data text,
PRIMARY KEY (user_id, event_time)
);
-- After: Add time bucket to distribute load
CREATE TABLE user_events_v2 (
user_id text,
time_bucket text, -- e.g., '2024-01-15_14' for hourly buckets
event_time timestamp,
event_data text,
PRIMARY KEY ((user_id, time_bucket), event_time)
);
Application-Level Caching¶
# If profiling shows read-heavy hot partition
# Implement caching for frequently accessed data
Rate Limiting¶
# If hot partition is due to misbehaving client
# Implement application-level rate limiting
Comparing with Related Commands¶
| Command | Purpose | Scope | Overhead |
|---|---|---|---|
profileload |
Detailed partition access profiling | Single table | Low |
toppartitions |
Real-time top partition view | Single table | Low |
tablestats |
General table statistics | All tables | None |
tablehistograms |
Latency distribution | Single table | None |
When to Use Each¶
profileload- Detailed investigation of specific table's access patternstoppartitions- Quick check of current hot partitionstablestats- Overview of table health and metricstablehistograms- Understanding latency distribution
Cluster-Wide Profiling¶
To understand cluster-wide patterns, profile across all nodes:
#!/bin/bash
# profile_all_nodes.sh
KEYSPACE="$1"
TABLE="$2"
DURATION="${3:-60000}"
if [ -z "$KEYSPACE" ] || [ -z "$TABLE" ]; then
echo "Usage: $0 <keyspace> <table> [duration_ms]"
exit 1
fi
OUTPUT_DIR="/tmp/profiles_$(date +%Y%m%d_%H%M%S)"
mkdir -p $OUTPUT_DIR# Get list of node IPs from local nodetool status
nodes=$(nodetool status | grep "^UN" | awk '{print $2}')
echo "Profiling $KEYSPACE.$TABLE for ${DURATION}ms on all nodes..."
echo "Output directory: $OUTPUT_DIR"
echo ""
# Start profiling on all nodes in parallel
for node in $nodes; do
echo "Starting profile on $node..."
ssh "$node" "nodetool profileload $KEYSPACE $TABLE $DURATION > "$OUTPUT_DIR/profile_$node.txt" &"
done
# Wait for all to complete
wait
echo ""
echo "Profiling complete. Results:"
echo ""
# Display summary from each node
for node in $nodes; do
echo "=== $node ==="
head -20 "$OUTPUT_DIR/profile_$node.txt"
echo ""
done
Best Practices¶
Profiling Guidelines
- Profile during representative periods - Run during normal load, not idle times
- Use appropriate duration - Longer for low-traffic tables, shorter for high-traffic
- Profile multiple times - Compare peak vs. off-peak patterns
- Profile all nodes - Hot partitions may only appear on specific nodes
- Correlate with symptoms - Run when latency issues occur
- Document findings - Save output for historical comparison
- Act on results - Profiling is only useful if hot partitions are addressed
Considerations
- Results are samples, not exact counts
- Profile duration should match workload patterns
- One profile may not capture intermittent issues
- Hot partitions identified are node-local (partition may be hot on one replica)
Troubleshooting¶
No Output or Empty Results¶
# Verify table has traffic
nodetool tablestats my_keyspace.my_table | grep "operations"
# Ensure duration is sufficient
# Try longer duration for low-traffic tables
nodetool profileload my_keyspace my_table 300000
Command Times Out¶
# Reduce duration
nodetool profileload my_keyspace my_table 30000
# Check JMX connectivity
nodetool info
Results Don't Match Expected Patterns¶
# Profile on specific node that's showing issues
ssh problem_node "nodetool profileload my_keyspace my_table 60000"
# Verify correct table
nodetool tablestats my_keyspace.my_table
Related Commands¶
| Command | Relationship |
|---|---|
| toppartitions | Real-time view of hot partitions |
| tablestats | Table statistics and metrics |
| tablehistograms | Latency histograms for table operations |
| proxyhistograms | Coordinator-level latency metrics |
| status | Node load distribution |