Thread Pools¶
The thread_pools virtual table shows the current state of all Cassandra thread pools, providing visibility into request processing capacity and backpressure.
Overview¶
Cassandra uses staged event-driven architecture (SEDA) with separate thread pools for different operation types. Monitoring these pools reveals bottlenecks and capacity issues.
SELECT name, active_tasks, pending_tasks, blocked_tasks, completed_tasks
FROM system_views.thread_pools;
Equivalent nodetool command: nodetool tpstats
Schema¶
VIRTUAL TABLE system_views.thread_pools (
name text PRIMARY KEY,
active_tasks int,
active_tasks_limit int,
blocked_tasks bigint,
blocked_tasks_all_time bigint,
completed_tasks bigint,
pending_tasks int
)
| Column | Type | Description |
|---|---|---|
name |
text | Thread pool name |
active_tasks |
int | Currently executing tasks |
active_tasks_limit |
int | Maximum concurrent tasks (pool size) |
pending_tasks |
int | Tasks queued waiting for execution |
blocked_tasks |
bigint | Tasks currently blocked due to backpressure |
blocked_tasks_all_time |
bigint | Total blocked tasks since node startup |
completed_tasks |
bigint | Total completed tasks since startup |
Key Thread Pools¶
Request Processing¶
| Pool Name | Purpose | Warning Signs |
|---|---|---|
Native-Transport-Requests |
CQL client requests | pending > 1000 indicates client backpressure |
RequestResponseStage |
Inter-node request/response | blocked > 0 indicates network issues |
Read/Write Operations¶
| Pool Name | Purpose | Warning Signs |
|---|---|---|
ReadStage |
Local read operations | pending > 100 indicates disk bottleneck |
MutationStage |
Local write operations | pending > 0, blocked > 0 critical |
CounterMutationStage |
Counter write operations | Same as MutationStage |
ViewMutationStage |
Materialized view updates | pending > 0 indicates MV lag |
Storage Operations¶
| Pool Name | Purpose | Warning Signs |
|---|---|---|
MemtableFlushWriter |
Memtable to SSTable flush | pending > 0 indicates flush backpressure |
MemtablePostFlush |
Post-flush cleanup | Should stay near zero |
CompactionExecutor |
Compaction tasks | pending > 100 indicates compaction falling behind |
ValidationExecutor |
Repair validation | High during repair operations |
Cluster Communication¶
| Pool Name | Purpose | Warning Signs |
|---|---|---|
GossipStage |
Gossip protocol | pending > 0 indicates network issues |
AntiEntropyStage |
Repair coordination | Active during repairs |
MigrationStage |
Schema changes | Usually idle |
HintsDispatcher |
Hint delivery | Active when catching up offline nodes |
Background Tasks¶
| Pool Name | Purpose | Warning Signs |
|---|---|---|
SecondaryIndexManagement |
Index maintenance | Should be low |
CacheCleanupExecutor |
Cache eviction | Usually idle |
InternalResponseStage |
Internal coordination | Should be low |
Sampler |
Query sampling | Always low |
Monitoring Queries¶
Critical Health Check¶
-- Find pools with problems
SELECT name, active_tasks, pending_tasks, blocked_tasks
FROM system_views.thread_pools
WHERE pending_tasks > 0 OR blocked_tasks > 0;
Pool Utilization¶
-- Check pool capacity usage
SELECT
name,
active_tasks,
active_tasks_limit,
CAST(active_tasks AS double) / active_tasks_limit * 100 AS utilization_pct,
pending_tasks
FROM system_views.thread_pools
WHERE active_tasks_limit > 0;
Historical Blocked Tasks¶
-- Pools that have experienced blocking
SELECT name, blocked_tasks_all_time, completed_tasks,
CAST(blocked_tasks_all_time AS double) / completed_tasks * 100 AS block_rate_pct
FROM system_views.thread_pools
WHERE blocked_tasks_all_time > 0;
Request Path Health¶
-- Monitor critical request path pools
SELECT name, active_tasks, pending_tasks, blocked_tasks
FROM system_views.thread_pools
WHERE name IN (
'Native-Transport-Requests',
'ReadStage',
'MutationStage',
'RequestResponseStage',
'MemtableFlushWriter',
'CompactionExecutor'
);
Interpreting Results¶
Healthy State¶
name | active_tasks | pending_tasks | blocked_tasks
----------------------------+--------------+---------------+---------------
Native-Transport-Requests | 45 | 0 | 0
ReadStage | 12 | 0 | 0
MutationStage | 8 | 0 | 0
CompactionExecutor | 2 | 3 | 0
MemtableFlushWriter | 0 | 0 | 0
Warning State¶
name | active_tasks | pending_tasks | blocked_tasks
----------------------------+--------------+---------------+---------------
Native-Transport-Requests | 128 | 500 | 0 ← Client backpressure
ReadStage | 32 | 150 | 0 ← Disk bottleneck
MutationStage | 32 | 0 | 0
CompactionExecutor | 4 | 200 | 0 ← Compaction behind
MemtableFlushWriter | 2 | 5 | 0 ← Flush pressure
Critical State¶
name | active_tasks | pending_tasks | blocked_tasks
----------------------------+--------------+---------------+---------------
Native-Transport-Requests | 128 | 5000 | 50 ← CRITICAL
MutationStage | 32 | 100 | 10 ← CRITICAL
MemtableFlushWriter | 2 | 20 | 5 ← CRITICAL
Alerting Rules¶
Blocked Tasks Alert (Critical)¶
-- Any blocked tasks is critical
SELECT name, blocked_tasks, blocked_tasks_all_time
FROM system_views.thread_pools
WHERE blocked_tasks > 0;
Action: Immediate investigation required. Blocked tasks indicate the system cannot keep up with load.
High Pending Tasks Alert¶
-- Sustained pending tasks
SELECT name, pending_tasks
FROM system_views.thread_pools
WHERE (name = 'MutationStage' AND pending_tasks > 10)
OR (name = 'ReadStage' AND pending_tasks > 100)
OR (name = 'CompactionExecutor' AND pending_tasks > 50)
OR (name = 'MemtableFlushWriter' AND pending_tasks > 2);
Flush Backpressure Alert¶
-- Memtable flush falling behind
SELECT name, pending_tasks
FROM system_views.thread_pools
WHERE name LIKE 'Memtable%' AND pending_tasks > 0;
Action: Check disk I/O, consider increasing memtable_flush_writers.
Troubleshooting¶
MutationStage Blocked¶
Symptoms:
- MutationStage shows blocked_tasks > 0
- Write latency spikes
Common Causes:
1. Memtable flush backpressure (check MemtableFlushWriter)
2. Commit log sync bottleneck
3. Disk I/O saturation
Resolution:
- Check disk utilization: iostat -x 1
- Verify commit log on fast disk
- Consider increasing concurrent_writes
ReadStage High Pending¶
Symptoms:
- ReadStage shows high pending_tasks
- Read latency increases
Common Causes: 1. Disk I/O bottleneck 2. Large partitions causing slow reads 3. Tombstone scanning 4. Cold cache causing excessive disk reads
Resolution:
- Check tombstones_per_read for tombstone issues
- Review partition sizes
- Verify key cache hit ratio
- Consider adding read capacity
CompactionExecutor Backlog¶
Symptoms:
- CompactionExecutor shows pending_tasks > 100
- Disk usage growing
- Read latency increasing
Common Causes: 1. Write rate exceeding compaction throughput 2. Large SSTables taking long to compact 3. Insufficient compaction threads
Resolution:
- Check nodetool compactionstats for details
- Consider increasing concurrent_compactors
- Review compaction strategy settings
- Verify disk throughput capacity
Configuration Tuning¶
Thread pool sizes can be adjusted in cassandra.yaml:
# Read/write stages
concurrent_reads: 32 # ReadStage size
concurrent_writes: 32 # MutationStage size
concurrent_counter_writes: 32
# Compaction
concurrent_compactors: 4
# Memtable flush
memtable_flush_writers: 2
# Native transport
native_transport_max_threads: 128
Tuning Considerations
Increasing thread pool sizes: - Consumes more memory per thread - May increase contention under load - Should be tested before production deployment
Default values are appropriate for most workloads.
Related Documentation¶
- Virtual Tables Overview - Introduction to virtual tables
- Metrics Tables - Latency monitoring
- Performance Tuning - Optimization strategies
- nodetool tpstats - Command-line equivalent