Skip to content

Thread Pools

The thread_pools virtual table shows the current state of all Cassandra thread pools, providing visibility into request processing capacity and backpressure.


Overview

Cassandra uses staged event-driven architecture (SEDA) with separate thread pools for different operation types. Monitoring these pools reveals bottlenecks and capacity issues.

SELECT name, active_tasks, pending_tasks, blocked_tasks, completed_tasks
FROM system_views.thread_pools;

Equivalent nodetool command: nodetool tpstats


Schema

VIRTUAL TABLE system_views.thread_pools (
    name text PRIMARY KEY,
    active_tasks int,
    active_tasks_limit int,
    blocked_tasks bigint,
    blocked_tasks_all_time bigint,
    completed_tasks bigint,
    pending_tasks int
)
Column Type Description
name text Thread pool name
active_tasks int Currently executing tasks
active_tasks_limit int Maximum concurrent tasks (pool size)
pending_tasks int Tasks queued waiting for execution
blocked_tasks bigint Tasks currently blocked due to backpressure
blocked_tasks_all_time bigint Total blocked tasks since node startup
completed_tasks bigint Total completed tasks since startup

Key Thread Pools

Request Processing

Pool Name Purpose Warning Signs
Native-Transport-Requests CQL client requests pending > 1000 indicates client backpressure
RequestResponseStage Inter-node request/response blocked > 0 indicates network issues

Read/Write Operations

Pool Name Purpose Warning Signs
ReadStage Local read operations pending > 100 indicates disk bottleneck
MutationStage Local write operations pending > 0, blocked > 0 critical
CounterMutationStage Counter write operations Same as MutationStage
ViewMutationStage Materialized view updates pending > 0 indicates MV lag

Storage Operations

Pool Name Purpose Warning Signs
MemtableFlushWriter Memtable to SSTable flush pending > 0 indicates flush backpressure
MemtablePostFlush Post-flush cleanup Should stay near zero
CompactionExecutor Compaction tasks pending > 100 indicates compaction falling behind
ValidationExecutor Repair validation High during repair operations

Cluster Communication

Pool Name Purpose Warning Signs
GossipStage Gossip protocol pending > 0 indicates network issues
AntiEntropyStage Repair coordination Active during repairs
MigrationStage Schema changes Usually idle
HintsDispatcher Hint delivery Active when catching up offline nodes

Background Tasks

Pool Name Purpose Warning Signs
SecondaryIndexManagement Index maintenance Should be low
CacheCleanupExecutor Cache eviction Usually idle
InternalResponseStage Internal coordination Should be low
Sampler Query sampling Always low

Monitoring Queries

Critical Health Check

-- Find pools with problems
SELECT name, active_tasks, pending_tasks, blocked_tasks
FROM system_views.thread_pools
WHERE pending_tasks > 0 OR blocked_tasks > 0;

Pool Utilization

-- Check pool capacity usage
SELECT
    name,
    active_tasks,
    active_tasks_limit,
    CAST(active_tasks AS double) / active_tasks_limit * 100 AS utilization_pct,
    pending_tasks
FROM system_views.thread_pools
WHERE active_tasks_limit > 0;

Historical Blocked Tasks

-- Pools that have experienced blocking
SELECT name, blocked_tasks_all_time, completed_tasks,
       CAST(blocked_tasks_all_time AS double) / completed_tasks * 100 AS block_rate_pct
FROM system_views.thread_pools
WHERE blocked_tasks_all_time > 0;

Request Path Health

-- Monitor critical request path pools
SELECT name, active_tasks, pending_tasks, blocked_tasks
FROM system_views.thread_pools
WHERE name IN (
    'Native-Transport-Requests',
    'ReadStage',
    'MutationStage',
    'RequestResponseStage',
    'MemtableFlushWriter',
    'CompactionExecutor'
);

Interpreting Results

Healthy State

 name                       | active_tasks | pending_tasks | blocked_tasks
----------------------------+--------------+---------------+---------------
 Native-Transport-Requests  |           45 |             0 |             0
 ReadStage                  |           12 |             0 |             0
 MutationStage              |            8 |             0 |             0
 CompactionExecutor         |            2 |             3 |             0
 MemtableFlushWriter        |            0 |             0 |             0

Warning State

 name                       | active_tasks | pending_tasks | blocked_tasks
----------------------------+--------------+---------------+---------------
 Native-Transport-Requests  |          128 |           500 |             0  ← Client backpressure
 ReadStage                  |           32 |           150 |             0  ← Disk bottleneck
 MutationStage              |           32 |             0 |             0
 CompactionExecutor         |            4 |           200 |             0  ← Compaction behind
 MemtableFlushWriter        |            2 |             5 |             0  ← Flush pressure

Critical State

 name                       | active_tasks | pending_tasks | blocked_tasks
----------------------------+--------------+---------------+---------------
 Native-Transport-Requests  |          128 |          5000 |            50  ← CRITICAL
 MutationStage              |           32 |           100 |            10  ← CRITICAL
 MemtableFlushWriter        |            2 |            20 |             5  ← CRITICAL

Alerting Rules

Blocked Tasks Alert (Critical)

-- Any blocked tasks is critical
SELECT name, blocked_tasks, blocked_tasks_all_time
FROM system_views.thread_pools
WHERE blocked_tasks > 0;

Action: Immediate investigation required. Blocked tasks indicate the system cannot keep up with load.

High Pending Tasks Alert

-- Sustained pending tasks
SELECT name, pending_tasks
FROM system_views.thread_pools
WHERE (name = 'MutationStage' AND pending_tasks > 10)
   OR (name = 'ReadStage' AND pending_tasks > 100)
   OR (name = 'CompactionExecutor' AND pending_tasks > 50)
   OR (name = 'MemtableFlushWriter' AND pending_tasks > 2);

Flush Backpressure Alert

-- Memtable flush falling behind
SELECT name, pending_tasks
FROM system_views.thread_pools
WHERE name LIKE 'Memtable%' AND pending_tasks > 0;

Action: Check disk I/O, consider increasing memtable_flush_writers.


Troubleshooting

MutationStage Blocked

Symptoms: - MutationStage shows blocked_tasks > 0 - Write latency spikes

Common Causes: 1. Memtable flush backpressure (check MemtableFlushWriter) 2. Commit log sync bottleneck 3. Disk I/O saturation

Resolution: - Check disk utilization: iostat -x 1 - Verify commit log on fast disk - Consider increasing concurrent_writes

ReadStage High Pending

Symptoms: - ReadStage shows high pending_tasks - Read latency increases

Common Causes: 1. Disk I/O bottleneck 2. Large partitions causing slow reads 3. Tombstone scanning 4. Cold cache causing excessive disk reads

Resolution: - Check tombstones_per_read for tombstone issues - Review partition sizes - Verify key cache hit ratio - Consider adding read capacity

CompactionExecutor Backlog

Symptoms: - CompactionExecutor shows pending_tasks > 100 - Disk usage growing - Read latency increasing

Common Causes: 1. Write rate exceeding compaction throughput 2. Large SSTables taking long to compact 3. Insufficient compaction threads

Resolution: - Check nodetool compactionstats for details - Consider increasing concurrent_compactors - Review compaction strategy settings - Verify disk throughput capacity


Configuration Tuning

Thread pool sizes can be adjusted in cassandra.yaml:

# Read/write stages
concurrent_reads: 32      # ReadStage size
concurrent_writes: 32     # MutationStage size
concurrent_counter_writes: 32

# Compaction
concurrent_compactors: 4

# Memtable flush
memtable_flush_writers: 2

# Native transport
native_transport_max_threads: 128

Tuning Considerations

Increasing thread pool sizes: - Consumes more memory per thread - May increase contention under load - Should be tested before production deployment

Default values are appropriate for most workloads.