Slow Queries¶

Slow queries manifest as high latency, timeouts, or degraded application performance. This playbook helps identify and resolve query performance issues.

Symptoms¶

High read/write latencies in nodetool proxyhistograms
Query timeouts from client applications
Specific queries consistently slow
"Slow query" warnings in logs
User-reported application slowness

Diagnosis¶

Step 1: Check Overall Latencies¶

# Coordinator latencies
nodetool proxyhistograms

# Per-table latencies
nodetool tablehistograms my_keyspace my_table

What to look for: - p99 latency > 100ms for reads - p99 latency > 50ms for writes - Large gap between p50 and p99 (inconsistent performance)

Step 2: Enable Slow Query Logging¶

# cassandra.yaml
slow_query_log_timeout_in_ms: 500

Then check logs:

grep "slow query" /var/log/cassandra/debug.log | tail -50

Step 3: Trace Specific Queries¶

TRACING ON;
SELECT * FROM my_table WHERE ...;
TRACING OFF;

What to look for in trace: - Time spent in each phase - Number of SSTables read - Tombstones scanned - Partitions touched

Step 4: Check Table Health¶

nodetool tablestats my_keyspace.my_table

Problem indicators: - High SSTable count (> 20) - High tombstones per slice - Large partition sizes - Low key cache hit rate

Step 5: Check for Hotspots¶

# Top partitions by read/write activity
nodetool toppartitions my_keyspace my_table 10000

Resolution¶

Query Anti-Pattern: Full Table Scan¶

Problem:

SELECT * FROM users;  -- Scans entire cluster

Solution:

-- Add WHERE clause on partition key
SELECT * FROM users WHERE user_id = ?;

-- Or use pagination
SELECT * FROM users LIMIT 100;

Query Anti-Pattern: ALLOW FILTERING¶

Problem:

SELECT * FROM users WHERE email = '[email protected]' ALLOW FILTERING;

Solution:

-- Create secondary index
CREATE INDEX ON users (email);

-- Or create materialized view
CREATE MATERIALIZED VIEW users_by_email AS
  SELECT * FROM users
  WHERE email IS NOT NULL AND user_id IS NOT NULL
  PRIMARY KEY (email, user_id);

Query Anti-Pattern: IN with Many Values¶

Problem:

SELECT * FROM orders WHERE order_id IN (uuid1, uuid2, ..., uuid100);

Solution:

-- Use async parallel queries from application
-- Or batch into smaller groups
SELECT * FROM orders WHERE order_id IN (uuid1, uuid2, uuid3);

Query Anti-Pattern: Range Queries on Clustering Columns¶

Problem:

SELECT * FROM events WHERE user_id = ? AND event_time > '2024-01-01';
-- Scans potentially millions of rows

Solution:

-- Add LIMIT
SELECT * FROM events WHERE user_id = ? AND event_time > '2024-01-01' LIMIT 1000;

-- Or redesign for bounded queries
SELECT * FROM events WHERE user_id = ? AND day = '2024-01-15';

Data Model Issue: Large Partitions¶

See Large Partition Issues.

# Check partition sizes
nodetool tablestats my_keyspace.my_table | grep partition

Data Model Issue: Tombstone Accumulation¶

See Tombstone Accumulation.

# Check tombstone counts
nodetool tablestats my_keyspace.my_table | grep tombstone

Infrastructure Issue: Compaction Backlog¶

# Check pending compactions
nodetool compactionstats

# If backlog exists
nodetool compact my_keyspace my_table

Infrastructure Issue: Insufficient Resources¶

# Check CPU
top -p $(pgrep -f CassandraDaemon)

# Check disk I/O
iostat -x 1 5

# Check thread pools
nodetool tpstats

Query Optimization Checklist¶

Check	Good	Bad	Fix
Partition key in WHERE	Yes	No	Add partition key filter
ALLOW FILTERING	Not used	Used	Create index or view
IN clause size	< 10 values	> 100 values	Parallel queries
Result set size	LIMIT used	No LIMIT	Add LIMIT
Table SSTable count	< 20	> 50	Run compaction
Tombstones per read	< 100	> 1000	Fix data model
Key cache hit rate	> 90%	< 50%	Increase cache

Recovery¶

Verify Improvement¶

# Check latencies after fix
nodetool tablehistograms my_keyspace my_table

# Trace query again
TRACING ON;
<your query>;
TRACING OFF;

Monitor Going Forward¶

Set up alerts on: - p99 read latency > 100ms - p99 write latency > 50ms - Slow query log entries

Prevention¶

Review queries before production - Check execution plans
Monitor query latencies - Alert on degradation
Design data model for queries - Don't retrofit
Use prepared statements - Reduce parsing overhead
Implement client-side caching - Reduce load for hot data
Run regular compaction - Keep SSTable counts low

Command	Purpose
`nodetool proxyhistograms`	Overall latencies
`nodetool tablehistograms`	Per-table latencies
`nodetool tablestats`	Table health metrics
`nodetool toppartitions`	Identify hot partitions
`TRACING ON/OFF`	Query tracing

CQL Reference - Query syntax and options
Data Modeling - Design patterns
Large Partition Issues - Partition sizing
Tombstone Accumulation - Tombstone issues

Slow Queries¶

Symptoms¶

Diagnosis¶

Step 1: Check Overall Latencies¶

Step 2: Enable Slow Query Logging¶

Step 3: Trace Specific Queries¶

Step 4: Check Table Health¶

Step 5: Check for Hotspots¶

Resolution¶

Query Anti-Pattern: Full Table Scan¶

Query Anti-Pattern: ALLOW FILTERING¶

Query Anti-Pattern: IN with Many Values¶

Query Anti-Pattern: Range Queries on Clustering Columns¶

Data Model Issue: Large Partitions¶

Data Model Issue: Tombstone Accumulation¶

Infrastructure Issue: Compaction Backlog¶

Infrastructure Issue: Insufficient Resources¶

Query Optimization Checklist¶

Recovery¶

Verify Improvement¶

Monitor Going Forward¶

Prevention¶

Related Commands¶

Related Documentation¶