AxonOps Thread Pools Dashboard Metrics Mapping¶

This document maps the metrics used in the AxonOps Thread Pools dashboard.

Dashboard Overview¶

The Thread Pools dashboard monitors Cassandra's internal thread pools that handle various operations like reads, writes, compactions, and repairs. Understanding thread pool behavior is crucial for identifying performance bottlenecks and tuning Cassandra for optimal performance.

Metrics Mapping¶

Thread Pool Metrics¶

Dashboard Metric	Description	Attributes
`cas_ThreadPools_internal`	Internal thread pool metrics	`scope` (pool name), `key` (metric type), `dc`, `rack`, `host_id`

Metric Keys (Types)¶

Key	Description
`ActiveTasks`	Number of tasks currently being executed
`PendingTasks`	Number of tasks waiting in the queue
`CompletedTasks`	Total number of completed tasks (cumulative)
`TotalBlockedTasks`	Total number of tasks that were blocked (cumulative)
`CurrentlyBlockedTasks`	Number of tasks currently blocked

Common Thread Pool Scopes¶

Scope	Purpose
`MutationStage`	Handles write operations
`ReadStage`	Handles read operations
`RequestResponseStage`	Handles request/response messaging
`CompactionExecutor`	Handles compaction tasks
`ValidationExecutor`	Handles validation tasks (repairs)
`GossipStage`	Handles gossip protocol
`AntiEntropyStage`	Handles anti-entropy repairs
`MigrationStage`	Handles schema migrations
`MemtableFlushWriter`	Handles memtable flush operations
`MemtablePostFlush`	Handles post-flush operations
`HintsDispatcher`	Handles hint delivery

Query Examples¶

Active Tasks¶

sum(cas_ThreadPools_internal{scope=~'$scope',key='ActiveTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)

Pending Tasks¶

sum(cas_ThreadPools_internal{scope=~'$scope',key='PendingTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)

Completed Tasks Rate¶

sum(cas_ThreadPools_internal{axonfunction='rate',scope=~'$scope',key='CompletedTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)

Total Blocked Tasks Rate¶

sum(cas_ThreadPools_internal{axonfunction='rate',scope=~'$scope',key='TotalBlockedTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)

Currently Blocked Tasks¶

sum(cas_ThreadPools_internal{scope=~'$scope',key='CurrentlyBlockedTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)

Panel Organization¶

For each selected thread pool ($scope), the dashboard shows:

Active Tasks - Line chart showing currently executing tasks
Pending Tasks - Line chart showing queued tasks waiting for execution
Completed Tasks Rate by $groupBy - Line chart showing task completion rate
Total Blocked Tasks Rate - Line chart showing rate of tasks being blocked
Currently Blocked Tasks Rate - Line chart showing currently blocked tasks

Filters¶

data center (dc) - Filter by data center
rack - Filter by rack
node (host_id) - Filter by specific node
Pool (scope) - Select specific thread pool(s) to monitor
groupBy - Dynamic grouping (scope, dc, rack, host_id)

Important Thread Pools to Monitor¶

MutationStage¶

Handles all write operations
High pending tasks indicate write bottleneck
Blocked tasks suggest memtable pressure

ReadStage¶

Handles all read operations
Pending tasks indicate read latency issues
May need to tune concurrent_reads

CompactionExecutor¶

Manages compaction operations
High pending tasks mean compactions falling behind
Affects disk space and read performance

MemtableFlushWriter¶

Flushes memtables to disk
Blocked tasks indicate disk I/O issues
Critical for write performance

Performance Indicators¶

Healthy Patterns¶

Low or zero pending tasks
No currently blocked tasks
Steady completed task rate
Active tasks within thread pool size

Warning Signs¶

Consistently growing pending tasks
Frequent blocked tasks
Active tasks at maximum pool size
Sudden drops in completion rate

Tuning Considerations¶

Thread Pool Sizing:

Configured in cassandra.yaml
Balance between concurrency and resource usage
Consider CPU cores and workload type

Common Adjustments:

concurrent_reads: For read-heavy workloads
concurrent_writes: For write-heavy workloads
concurrent_compactors: For compaction throughput

Monitoring Strategy:

Watch for sustained pending tasks
Monitor blocked tasks for resource contention
Compare completion rates across nodes

Grouping and Aggregation¶

The groupBy variable allows flexible analysis:

By scope: Compare different thread pools
By dc: Data center level patterns
By rack: Rack level distribution
By host_id: Individual node behavior

Units and Display¶

Task Counts: Displayed as short numbers
Rates: Tasks per second
Legend: Shows the groupBy dimension
Time Series: Real-time and historical trends