Skip to content

Common Cassandra Errors

Reference guide for frequently encountered Cassandra errors, their causes, and solutions.


Error Categories

Timeout Errors

Errors occurring when operations exceed configured time limits.

Error Default Timeout Common Cause
ReadTimeoutException 5000ms Slow disk, large partitions, tombstones
WriteTimeoutException 2000ms Overloaded nodes, disk issues
RangeSliceTimeoutException 10000ms Large range scans
TruncateException 60000ms Large table truncation

Availability Errors

Errors related to replica availability.

Error Cause Solution
UnavailableException Insufficient replicas alive Check node status, reduce consistency level
NoHostAvailableException Cannot reach any coordinator Check network, verify cluster is running
WriteFailureException Replica write failed Check failing node logs
ReadFailureException Replica read failed Check failing node logs

Data Errors

Errors related to data or schema.

Error Cause Solution
InvalidQueryException CQL syntax or semantic error Fix query syntax
InvalidRequestException Invalid request parameters Check request parameters
TombstoneOverwhelmingException Too many tombstones Fix data model, run compaction
SyntaxException CQL syntax error Check CQL syntax

Authentication/Authorization Errors

Security-related errors.

Error Cause Solution
AuthenticationException Invalid credentials Verify username/password
UnauthorizedException Insufficient permissions Grant required permissions

Quick Diagnosis

Check Cluster Health

# Node status
nodetool status

# Thread pools (look for blocked/dropped)
nodetool tpstats

# Compaction status
nodetool compactionstats

Check Logs

# Recent errors
grep -i "error\|exception" /var/log/cassandra/system.log | tail -50

# Specific error
grep "TimeoutException" /var/log/cassandra/system.log | tail -20

Check Table Health

# Table statistics
nodetool tablestats my_keyspace.my_table

# Key metrics to check:
# - SSTable count (high = needs compaction)
# - Average tombstones per read (high = data model issue)
# - Partition size (large = data model issue)

Error Resolution Workflow

Error Occurs
    │
    ▼
┌─────────────────────────────────────┐
│  1. Identify error type             │
│     - Check client exception        │
│     - Check Cassandra logs          │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│  2. Check cluster health            │
│     - nodetool status               │
│     - nodetool tpstats              │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│  3. Check specific component        │
│     - Table stats for data issues   │
│     - Compaction for performance    │
│     - Logs for stack traces         │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│  4. Apply fix                       │
│     - See specific error page       │
│     - Follow playbook if available  │
└─────────────────────────────────────┘

Detailed Error Guides

Timeout Errors

Playbooks for Complex Issues

For issues requiring multi-step resolution, see the Troubleshooting Playbooks.


Prevention

Monitoring

Set up alerts for early warning:

Metric Warning Threshold Critical Threshold
Read latency p99 > 100ms > 500ms
Write latency p99 > 50ms > 200ms
Pending compactions > 20 > 100
Dropped messages > 0 > 10/min
SSTable count > 20 per table > 50 per table

Best Practices

  1. Monitor proactively - Don't wait for errors
  2. Run repairs regularly - Weekly for most workloads
  3. Keep compaction healthy - Monitor pending tasks
  4. Size partitions correctly - Aim for < 100MB per partition
  5. Avoid tombstone accumulation - Use TTLs and proper deletion patterns