Cassandra Performance Tuning Guide¶

Out of the box, Cassandra is configured to not crash—not to perform well. The default heap is small, compaction is throttled, and the OS settings are whatever your distribution ships with. These defaults make sense for trying Cassandra on a laptop, not for production.

Proper tuning happens at multiple layers: JVM (heap size and garbage collector), OS (disable swap, disable transparent huge pages, raise file descriptor limits), and Cassandra configuration (compaction throughput, thread pools, memtable sizing). Getting these right can mean 10x better throughput or 5x lower latency.

This guide covers what to tune, how to tune it, and how to measure whether it worked.

Measure Before Tuning

Always establish baseline metrics before making changes. Tune one parameter at a time and measure the impact. Changes that improve one workload may degrade another.

Performance Overview¶

Performance Layers¶

Key Performance Metrics¶

Metric	Good	Warning	Critical
Read Latency p99	< 10ms	10-100ms	> 100ms
Write Latency p99	< 5ms	5-50ms	> 50ms
CPU Utilization	< 50%	50-70%	> 70%
Heap Usage	< 60%	60-80%	> 80%
Pending Compactions	< 10	10-50	> 50
GC Pause Time	< 200ms	200-500ms	> 500ms

Hardware Recommendations¶

CPU¶

Workload Type	Recommended	Cores	Notes
Development	Any modern	2-4	Testing only
Light production	Intel Xeon / AMD EPYC	8-16	Small clusters
Heavy production	High-freq Xeon / EPYC	16-32	Write-heavy
Enterprise	Latest gen server CPUs	32+	Large partitions

Key considerations: - Clock speed matters more than core count for most workloads - Compression and compaction benefit from multiple cores - Avoid CPU throttling (disable power saving)

Memory¶

Use Case	Recommended RAM	Heap Size
Development	8GB	2-4GB
Production (light)	32GB	8GB
Production (standard)	64GB	16-24GB
Production (heavy)	128GB+	31GB max

Memory allocation:

Total RAM = JVM Heap + Off-Heap + OS Cache + Headroom

Example (64GB server):
- JVM Heap: 16GB
- Off-Heap (memtables, bloom filters, etc.): 8GB
- OS Page Cache: 32GB
- Headroom: 8GB

Storage¶

Storage Type	IOPS	Latency	Best For
HDD (7.2K)	100-150	5-15ms	Archive only
SSD (SATA)	30K-100K	0.1-0.5ms	Development
SSD (NVMe)	100K-500K	0.02-0.1ms	Production
NVMe (Enterprise)	500K-1M+	< 0.02ms	High performance

Storage guidelines: - Minimum 2TB per node recommended - Separate commit log from data (if possible) - Plan for 50% headroom for compaction - RAID: Use RAID 0 or JBOD (Cassandra handles replication)

Network¶

Environment	Bandwidth	Latency
Same rack	10 Gbps	< 0.5ms
Same DC	10 Gbps	< 1ms
Cross-DC	1+ Gbps	< 100ms

JVM Tuning Quick Reference¶

Heap Sizing Rules¶

# jvm11-server.options

# Rule: Heap should be 1/4 of RAM, max 31GB (for compressed oops)
-Xms16G
-Xmx16G

# For servers with 64GB RAM:
# -Xms16G -Xmx16G (recommended)

# For servers with 128GB+ RAM:
# -Xms31G -Xmx31G (max for compressed oops)

G1GC Configuration (Recommended)¶

# Use G1GC (default in modern Cassandra)
-XX:+UseG1GC

# Pause time target (balance throughput vs latency)
-XX:MaxGCPauseMillis=500

# Heap occupancy trigger
-XX:InitiatingHeapOccupancyPercent=70

# String deduplication (saves memory)
-XX:+UseStringDeduplication

# GC logging
-Xlog:gc*:file=/var/log/cassandra/gc.log:time,uptime:filecount=10,filesize=10M

ZGC Configuration (JDK 17+, Low Latency)¶

# Use ZGC for sub-millisecond pauses
-XX:+UseZGC
-XX:+ZGenerational

# Soft max heap (allows expansion under pressure)
-XX:SoftMaxHeapSize=28G  # With -Xmx31G

Off-Heap Memory¶

# cassandra.yaml
# Memtables can use off-heap memory
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048

# File system cache is critical - do not starve it!

OS Tuning Quick Reference¶

Essential sysctl Settings¶

# /etc/sysctl.d/99-cassandra.conf

# Disable swap usage
vm.swappiness = 1

# Increase max memory map areas
vm.max_map_count = 1048575

# Network buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Connection handling
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# Apply changes
sudo sysctl -p /etc/sysctl.d/99-cassandra.conf

File Descriptor Limits¶

# /etc/security/limits.d/cassandra.conf
cassandra soft memlock unlimited
cassandra hard memlock unlimited
cassandra soft nofile 1048576
cassandra hard nofile 1048576
cassandra soft nproc 32768
cassandra hard nproc 32768
cassandra soft as unlimited
cassandra hard as unlimited

Disable Swap¶

# Temporary
sudo swapoff -a

# Permanent - edit /etc/fstab
# Comment out swap entries

Disable Transparent Huge Pages (THP)¶

# Temporary
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# Permanent - add to /etc/rc.local or systemd service

I/O Scheduler¶

# Check current scheduler
cat /sys/block/sda/queue/scheduler

# Set to none/noop for SSDs (temporary)
echo none > /sys/block/sda/queue/scheduler

# For NVMe, default is typically correct

Query Optimization¶

Efficient Query Patterns¶

Do - Partition key queries:

-- Fast: Uses partition key
SELECT * FROM users WHERE user_id = ?;

-- Fast: Partition key with clustering range
SELECT * FROM messages
WHERE user_id = ?
  AND sent_at >= '2024-01-01'
  AND sent_at < '2024-02-01';

Avoid - Full table scans:

-- Slow: No partition key (requires ALLOW FILTERING)
SELECT * FROM users WHERE age > 30 ALLOW FILTERING;

-- Slow: Large IN clause
SELECT * FROM users WHERE user_id IN (?, ?, ?, ..., ?); -- 100+ values

Prepared Statements¶

// Good: Prepare once, execute many
PreparedStatement prepared = session.prepare(
    "SELECT * FROM users WHERE user_id = ?");

for (UUID userId : userIds) {
    BoundStatement bound = prepared.bind(userId);
    session.execute(bound);
}

// Bad: Prepare every time
for (UUID userId : userIds) {
    session.execute(
        "SELECT * FROM users WHERE user_id = " + userId);
}

Benefits: - Query parsed once - Reduced network traffic - Better server-side caching - Protection against CQL injection

Pagination for Large Results¶

// Use automatic paging
Statement statement = SimpleStatement.builder("SELECT * FROM large_table")
    .setPageSize(1000)
    .build();

ResultSet rs = session.execute(statement);
for (Row row : rs) {
    // Process row
    // Driver automatically fetches next page
}

// Or manual paging with state
ByteBuffer pagingState = null;
do {
    Statement stmt = SimpleStatement.builder("SELECT * FROM large_table")
        .setPageSize(1000)
        .setPagingState(pagingState)
        .build();

    ResultSet rs = session.execute(stmt);
    pagingState = rs.getExecutionInfo().getPagingState();

    for (Row row : rs.currentPage()) {
        // Process row
    }
} while (pagingState != null);

When to Use Batches¶

Good use of batches (same partition):

-- Atomic updates to denormalized tables
BEGIN BATCH
    INSERT INTO users (user_id, email) VALUES (?, ?);
    INSERT INTO users_by_email (email, user_id) VALUES (?, ?);
APPLY BATCH;

Bad use of batches (different partitions):

-- DON'T use batches as a performance optimization
BEGIN BATCH
    INSERT INTO logs (log_id, message) VALUES (uuid(), 'msg1');
    INSERT INTO logs (log_id, message) VALUES (uuid(), 'msg2');
    INSERT INTO logs (log_id, message) VALUES (uuid(), 'msg3');
    -- ... 100 more inserts to different partitions
APPLY BATCH;
-- This is SLOWER than individual inserts!

Compaction Tuning¶

Strategy Selection¶

Strategy	Best For	Write Amp	Read Amp	Space Amp
STCS	Write-heavy, general	Low	High	High
LCS	Read-heavy, updates	High	Low	Low
TWCS	Time-series, TTL	Low	Low	Low
UCS	Universal (4.0+)	Configurable	Configurable	Configurable

Compaction Throughput¶

# cassandra.yaml

# Increase for faster compaction (uses more I/O)
compaction_throughput_mb_per_sec: 64  # Default: 64

# More compaction threads
concurrent_compactors: 4  # Default: based on disks

# Limit compaction during peak hours
# Use nodetool: nodetool setcompactionthroughput 32

Monitor Compaction¶

# Check pending compactions
nodetool compactionstats

# Check compaction history
nodetool compactionhistory

# Metrics to watch
# - PendingTasks
# - TotalCompactionsCompleted
# - BytesCompacted

Caching Configuration¶

Key Cache¶

# cassandra.yaml

# Partition key locations cache
key_cache_size_in_mb: 100  # Auto-sized by default

# Save interval
key_cache_save_period: 14400  # seconds

Row Cache (Use Carefully)¶

# Generally NOT recommended for production
row_cache_size_in_mb: 0  # Disabled by default

# If used, set per-table
# ALTER TABLE ks.table WITH caching = {'rows_per_partition': '100'};

Chunk Cache (Cassandra 4.0+)¶

# Off-heap cache for compressed data chunks
# Auto-configured based on available memory

Monitoring Cache Effectiveness¶

# Check cache hit rates
nodetool info | grep "Cache"

# Per-table cache stats
nodetool tablestats keyspace.table | grep -i cache

Benchmarking with cassandra-stress¶

Basic Read/Write Test¶

# Write 1M rows
cassandra-stress write n=1000000 -rate threads=50

# Read 1M rows
cassandra-stress read n=1000000 -rate threads=50

# Mixed workload (50% read, 50% write)
cassandra-stress mixed ratio\(write=1,read=1\) n=1000000 -rate threads=50

Custom Schema Test¶

# stress_profile.yaml
keyspace: test_ks
table: test_table

columnspec:
  - name: id
    size: uniform(1..10)
    population: uniform(1..1000000)
  - name: data
    size: gaussian(100..500, 200, 50)

insert:
  partitions: fixed(1)
  batchtype: UNLOGGED

queries:
  read:
    cql: SELECT * FROM test_table WHERE id = ?
    fields: samerow

# Run with profile
cassandra-stress user profile=stress_profile.yaml \
    ops\(insert=1,read=3\) n=1000000 -rate threads=100

Results Analysis¶

Results:
  Op rate                   :   15,234 op/s  [READ: 11,425 op/s, WRITE: 3,809 op/s]
  Partition rate            :   15,234 pk/s  [READ: 11,425 pk/s, WRITE: 3,809 pk/s]
  Row rate                  :   15,234 row/s [READ: 11,425 row/s, WRITE: 3,809 row/s]
  Latency mean              :    3.2 ms [READ: 2.8 ms, WRITE: 4.5 ms]
  Latency median            :    2.1 ms [READ: 1.9 ms, WRITE: 3.2 ms]
  Latency 95th percentile   :    8.5 ms [READ: 7.2 ms, WRITE: 12.1 ms]
  Latency 99th percentile   :   15.3 ms [READ: 12.8 ms, WRITE: 21.5 ms]
  Latency 99.9th percentile :   45.2 ms [READ: 38.1 ms, WRITE: 62.3 ms]
  Latency max               :  125.4 ms [READ: 98.2 ms, WRITE: 125.4 ms]

Performance Checklist¶

Before Production¶

[ ] Hardware meets requirements (SSD, adequate RAM)
[ ] Heap sized appropriately (max 31GB)
[ ] G1GC or ZGC configured
[ ] Swap disabled
[ ] THP disabled
[ ] File limits increased
[ ] sysctl tuned
[ ] Network latency < 1ms within DC

Data Model¶

[ ] Partition sizes < 100MB
[ ] No unbounded partition growth
[ ] Queries use partition key
[ ] No ALLOW FILTERING in production
[ ] Appropriate compaction strategy per table

Application¶

[ ] Using prepared statements
[ ] Connection pooling configured
[ ] Appropriate consistency levels
[ ] Pagination for large results
[ ] Token-aware routing enabled

Monitoring¶

[ ] Latency metrics tracked
[ ] GC pauses monitored
[ ] Pending compactions alerting
[ ] Disk usage alerts
[ ] Dropped messages monitoring

Read Performance Tuning¶

Diagnosing Slow Reads¶

# Step 1: Identify affected tables
nodetool tablestats <keyspace> | grep -A 15 "Table:"

# Step 2: Check read latency breakdown
nodetool proxyhistograms

# Step 3: Check for tombstone issues
nodetool tablestats <keyspace>.<table> | grep -i tombstone

# Step 4: Check SSTable count
nodetool cfstats <keyspace>.<table> | grep "SSTable count"

# Step 5: Check bloom filter effectiveness
nodetool tablestats <keyspace>.<table> | grep -i bloom

Common Read Performance Issues¶

High SSTable Count¶

Symptoms: Read latency increases over time, many SSTables per read

Diagnosis:

nodetool tablestats <keyspace>.<table> | grep "SSTable count"
# Healthy: <10 for STCS, <100 total for LCS

Solutions:

# Force compaction to reduce SSTable count
nodetool compact <keyspace> <table>

-- Adjust compaction strategy if needed
ALTER TABLE <keyspace>.<table>
WITH compaction = {
    'class': 'LeveledCompactionStrategy',
    'sstable_size_in_mb': 160
};

Excessive Tombstones¶

Symptoms: Read latency spikes, "TombstoneOverwhelmingException" in logs

Diagnosis:

nodetool tablestats <keyspace>.<table> | grep -i tombstone
# Warning if tombstones_per_read > 1000

Solutions:

-- Reduce gc_grace_seconds if repair runs frequently
ALTER TABLE <keyspace>.<table>
WITH gc_grace_seconds = 86400;  -- 1 day instead of 10 days

# Force major compaction to purge tombstones
nodetool compact <keyspace> <table>

Poor Cache Hit Rates¶

Symptoms: High disk reads, low cache hit rates

Diagnosis:

nodetool info | grep -i cache
# Key cache hit rate should be >80%

Solutions:

# cassandra.yaml - increase key cache
key_cache_size_in_mb: 100  # Default: auto (5% heap)

# Enable row cache for frequently accessed tables (use sparingly)
# ALTER TABLE with row_cache_enabled = true

Read Path Optimization Summary¶

Optimization	Impact	Configuration
Increase key cache	Fewer index lookups	`key_cache_size_in_mb`
Use prepared statements	Reduced parsing	Application code
Token-aware routing	Reduced coordinator hops	Driver configuration
Appropriate consistency	Fewer replicas read	Application code
Compression	Faster disk reads	Table compression settings

Write Performance Tuning¶

Diagnosing Slow Writes¶

# Step 1: Check write latency
nodetool proxyhistograms

# Step 2: Check memtable status
nodetool tpstats | grep -i memtable

# Step 3: Check commit log disk
df -h /var/lib/cassandra/commitlog
iostat -x 1 5

# Step 4: Check pending mutations
nodetool tpstats | grep -i mutation

# Step 5: Check if compaction is overwhelming
nodetool compactionstats

Common Write Performance Issues¶

Commit Log Contention¶

Symptoms: Write latency spikes, commit log disk at 100% utilization

Solutions:

# cassandra.yaml - use separate disk for commitlog
commitlog_directory: /mnt/commitlog  # SSD recommended

# Adjust sync mode
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000  # Default

# Or for durability-critical workloads
commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 2

Memtable Flush Bottleneck¶

Symptoms: Memtable flush taking too long, high memory pressure

Solutions:

# cassandra.yaml
memtable_cleanup_threshold: 0.33  # Flush when 33% of heap in memtables
memtable_flush_writers: 4         # Increase for more flush parallelism

Compaction Falling Behind¶

Symptoms: Growing pending compactions, disk space increasing

Solutions:

# cassandra.yaml - increase compaction throughput
compaction_throughput_mb_per_sec: 64  # Default 64, increase if disk allows

# Increase concurrent compactors
concurrent_compactors: 4  # Default: min(num_cpus, disk_count)

# Runtime adjustment
nodetool setcompactionthroughput 128

Write Path Optimization Summary¶

Optimization	Impact	Configuration
Separate commit log disk	Reduced write latency	`commitlog_directory`
Increase memtable size	Fewer flushes	`memtable_heap_space_in_mb`
Tune commit log sync	Latency vs durability	`commitlog_sync`
Batch writes	Amortized overhead	Application batching
Use UNLOGGED batches	Reduced coordinator work	For same-partition writes

Troubleshooting Performance¶

Systematic Investigation¶

#!/bin/bash
# performance-investigation.sh

echo "=== System Resources ==="
top -bn1 | head -20
free -h
df -h /var/lib/cassandra

echo -e "\n=== Cassandra Status ==="
nodetool status
nodetool tpstats | grep -v "^$"

echo -e "\n=== Latency Histograms ==="
nodetool proxyhistograms

echo -e "\n=== Compaction Status ==="
nodetool compactionstats

echo -e "\n=== GC Stats ==="
nodetool gcstats

echo -e "\n=== Recent Errors ==="
tail -50 /var/log/cassandra/system.log | grep -i error

Common Issues Quick Reference¶

Symptom	First Check	Common Cause
High read latency	`nodetool tablestats`	Tombstones, SSTable count
High write latency	`iostat`, commit log disk	Disk saturation
Request timeouts	`nodetool tpstats`	Thread pool exhaustion
Memory pressure	`nodetool info`	Heap too small, large partitions
Cluster imbalance	`nodetool status`	Uneven token distribution

AxonOps Performance Management¶

Identifying and resolving performance issues requires correlating metrics, analyzing query patterns, and understanding the impact of configuration changes. AxonOps provides tools that simplify performance management.

Performance Analytics¶

AxonOps provides:

Slow query identification: Automatic detection and ranking of slow queries
Query pattern analysis: Identify inefficient access patterns
Hot partition detection: Find partitions causing load imbalance
Historical comparison: Compare current performance to baselines

Capacity Planning¶

Growth forecasting: Predict when capacity will be exhausted
Trend analysis: Identify gradual performance degradation
What-if modeling: Simulate impact of configuration changes
Right-sizing recommendations: Optimize resource allocation

Configuration Management¶

Configuration drift detection: Identify nodes with different settings
Change tracking: Audit log of all configuration changes
Impact analysis: Correlate configuration changes with performance
Rollback guidance: Quickly identify when changes caused issues

See the AxonOps documentation for performance management features.

Next Steps¶

Monitoring Guide - Monitoring cluster health
Compaction Management - Compaction tuning details
Maintenance - Maintenance for performance
Architecture: Read Path - Understanding read performance
Architecture: Write Path - Understanding write performance