AxonOps Thread Pools Dashboard Metrics Mapping¶
This document maps the metrics used in the AxonOps Thread Pools dashboard.
Dashboard Overview¶
The Thread Pools dashboard monitors Cassandra's internal thread pools that handle various operations like reads, writes, compactions, and repairs. Understanding thread pool behavior is crucial for identifying performance bottlenecks and tuning Cassandra for optimal performance.
Metrics Mapping¶
Thread Pool Metrics¶
| Dashboard Metric | Description | Attributes |
|---|---|---|
cas_ThreadPools_internal |
Internal thread pool metrics | scope (pool name), key (metric type), dc, rack, host_id |
Metric Keys (Types)¶
| Key | Description |
|---|---|
ActiveTasks |
Number of tasks currently being executed |
PendingTasks |
Number of tasks waiting in the queue |
CompletedTasks |
Total number of completed tasks (cumulative) |
TotalBlockedTasks |
Total number of tasks that were blocked (cumulative) |
CurrentlyBlockedTasks |
Number of tasks currently blocked |
Common Thread Pool Scopes¶
| Scope | Purpose |
|---|---|
MutationStage |
Handles write operations |
ReadStage |
Handles read operations |
RequestResponseStage |
Handles request/response messaging |
CompactionExecutor |
Handles compaction tasks |
ValidationExecutor |
Handles validation tasks (repairs) |
GossipStage |
Handles gossip protocol |
AntiEntropyStage |
Handles anti-entropy repairs |
MigrationStage |
Handles schema migrations |
MemtableFlushWriter |
Handles memtable flush operations |
MemtablePostFlush |
Handles post-flush operations |
HintsDispatcher |
Handles hint delivery |
Query Examples¶
Active Tasks¶
sum(cas_ThreadPools_internal{scope=~'$scope',key='ActiveTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)
Pending Tasks¶
sum(cas_ThreadPools_internal{scope=~'$scope',key='PendingTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)
Completed Tasks Rate¶
sum(cas_ThreadPools_internal{axonfunction='rate',scope=~'$scope',key='CompletedTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)
Total Blocked Tasks Rate¶
sum(cas_ThreadPools_internal{axonfunction='rate',scope=~'$scope',key='TotalBlockedTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)
Currently Blocked Tasks¶
sum(cas_ThreadPools_internal{scope=~'$scope',key='CurrentlyBlockedTasks',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy)
Panel Organization¶
For each selected thread pool ($scope), the dashboard shows:
-
Active Tasks - Line chart showing currently executing tasks
-
Pending Tasks - Line chart showing queued tasks waiting for execution
-
Completed Tasks Rate by $groupBy - Line chart showing task completion rate
-
Total Blocked Tasks Rate - Line chart showing rate of tasks being blocked
-
Currently Blocked Tasks Rate - Line chart showing currently blocked tasks
Filters¶
-
data center (
dc) - Filter by data center -
rack - Filter by rack
-
node (
host_id) - Filter by specific node -
Pool (scope) - Select specific thread pool(s) to monitor
-
groupBy - Dynamic grouping (scope, dc, rack, host_id)
Important Thread Pools to Monitor¶
MutationStage¶
- Handles all write operations
- High pending tasks indicate write bottleneck
- Blocked tasks suggest memtable pressure
ReadStage¶
- Handles all read operations
- Pending tasks indicate read latency issues
- May need to tune concurrent_reads
CompactionExecutor¶
- Manages compaction operations
- High pending tasks mean compactions falling behind
- Affects disk space and read performance
MemtableFlushWriter¶
- Flushes memtables to disk
- Blocked tasks indicate disk I/O issues
- Critical for write performance
Performance Indicators¶
Healthy Patterns¶
- Low or zero pending tasks
- No currently blocked tasks
- Steady completed task rate
- Active tasks within thread pool size
Warning Signs¶
- Consistently growing pending tasks
- Frequent blocked tasks
- Active tasks at maximum pool size
- Sudden drops in completion rate
Tuning Considerations¶
Thread Pool Sizing:
- Configured in cassandra.yaml
- Balance between concurrency and resource usage
- Consider CPU cores and workload type
Common Adjustments:
concurrent_reads: For read-heavy workloadsconcurrent_writes: For write-heavy workloadsconcurrent_compactors: For compaction throughput
Monitoring Strategy:
- Watch for sustained pending tasks
- Monitor blocked tasks for resource contention
- Compare completion rates across nodes
Grouping and Aggregation¶
The groupBy variable allows flexible analysis:
- By
scope: Compare different thread pools - By
dc: Data center level patterns - By
rack: Rack level distribution - By
host_id: Individual node behavior
Units and Display¶
-
Task Counts: Displayed as short numbers
-
Rates: Tasks per second
-
Legend: Shows the groupBy dimension
-
Time Series: Real-time and historical trends