Skip to content

Cassandra Diagnosis Guide

Systematic approach to diagnosing Cassandra issues.

Diagnosis Framework (SDRR)

uml diagram

Quick Diagnostic Commands

Cluster Health

# Overall status
nodetool status

# Cluster information
nodetool describecluster

# Node information
nodetool info

Performance

# Thread pools
nodetool tpstats

# Request latencies
nodetool proxyhistograms

# Table statistics
nodetool tablestats <keyspace>

# Compaction status
nodetool compactionstats

Resource Usage

# Disk usage
df -h /var/lib/cassandra
nodetool status  # Shows Load per node

# Memory
nodetool info | grep "Heap Memory"
free -h

# CPU
top -b -n 1 | head -20

Common Diagnostic Scenarios

High Latency

# 1. Check coordinator latency
nodetool proxyhistograms

# 2. Check table-level latency
nodetool tablehistograms <keyspace> <table>

# 3. Check for pending tasks
nodetool tpstats

# 4. Check GC activity
grep "GC pause" /var/log/cassandra/gc.log | tail -20

# 5. Check compaction
nodetool compactionstats

Node Down

# 1. Check service status
systemctl status cassandra

# 2. Check logs
tail -200 /var/log/cassandra/system.log

# 3. Check disk space
df -h /var/lib/cassandra

# 4. Check OOM killer
dmesg | grep -i killed

# 5. Check gossip (from another node)
nodetool gossipinfo

High CPU

# 1. Identify process
top -H -p $(pgrep -f CassandraDaemon)

# 2. Check compaction
nodetool compactionstats

# 3. Check thread pools
nodetool tpstats

# 4. Take thread dump
jstack $(pgrep -f CassandraDaemon) > /tmp/threads.txt

Memory Issues

# 1. Check heap usage
nodetool info | grep "Heap Memory"

# 2. Check GC logs
tail -100 /var/log/cassandra/gc.log

# 3. Check off-heap
nodetool info | grep "Off Heap"

# 4. Check system memory
free -h

Diagnostic Tools

Tool Purpose
nodetool Cluster management and metrics
cqlsh Query testing and tracing
jstack Thread dumps
jmap Memory analysis
jstat GC statistics
iostat Disk I/O
sar System activity

Query Tracing

-- Enable tracing
TRACING ON;

-- Run query
SELECT * FROM my_table WHERE id = ?;

-- Review trace output for:
-- - Coordinator activity
-- - Replica responses
-- - Read repair
-- - Tombstone scanning

Next Steps