AxonOps Table Dashboard Metrics Mapping¶

This document maps the metrics used in the AxonOps Table dashboard.

Dashboard Overview¶

The Table dashboard provides comprehensive table-level metrics including coordinator and replica statistics, latency metrics, throughput, data distribution, and performance indicators. It helps identify table-specific issues and optimization opportunities.

Metrics Mapping¶

Data Distribution Metrics¶

Dashboard Metric	Description	Attributes
`cas_Table_LiveDiskSpaceUsed`	Live data size per table	`keyspace`, `scope` (table), `function=Count`, `dc`, `rack`, `host_id`

Coordinator Metrics (Table-level)¶

Dashboard Metric	Description	Attributes
`cas_Table_CoordinatorReadLatency`	Read latency at coordinator level	`keyspace`, `scope` (table), `function` (percentiles/Count), `axonfunction` (rate), `dc`, `rack`, `host_id`
`cas_Table_CoordinatorWriteLatency`	Write latency at coordinator level	`keyspace`, `scope` (table), `function` (percentiles/Count), `axonfunction` (rate), `dc`, `rack`, `host_id`
`cas_Table_CoordinatorScanLatency`	Range scan latency at coordinator	`keyspace`, `scope` (table), `function` (percentiles/Count), `axonfunction` (rate), `dc`, `rack`, `host_id`

Replica Metrics (Local)¶

Dashboard Metric	Description	Attributes
`cas_Table_ReadLatency`	Local read latency	`keyspace`, `scope` (table), `function` (percentiles/Count), `axonfunction` (rate), `dc`, `rack`, `host_id`
`cas_Table_WriteLatency`	Local write latency	`keyspace`, `scope` (table), `function` (percentiles/Count), `axonfunction` (rate), `dc`, `rack`, `host_id`
`cas_Table_RangeLatency`	Local range query latency	`keyspace`, `scope` (table), `function` (percentiles/Count), `axonfunction` (rate), `dc`, `rack`, `host_id`

Partition and SSTable Metrics¶

Dashboard Metric	Description	Attributes
`cas_Table_MeanPartitionSize`	Average partition size	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`
`cas_Table_MaxPartitionSize`	Maximum partition size	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`
`cas_Table_EstimatedPartitionCount`	Estimated number of partitions	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`
`cas_Table_LiveSSTableCount`	Number of live SSTables	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`
`cas_Table_SSTablesPerReadHistogram`	SSTables accessed per read	`keyspace`, `scope` (table), `function` (percentiles), `dc`, `rack`, `host_id`

Performance Indicators¶

Dashboard Metric	Description	Attributes
`cas_Table_TombstoneScannedHistogram`	Tombstones scanned per query	`keyspace`, `scope` (table), `function` (percentiles), `dc`, `rack`, `host_id`
`cas_Table_SpeculativeRetries`	Speculative retry attempts	`keyspace`, `scope` (table), `function=Count`, `axonfunction` (rate), `dc`, `rack`, `host_id`
`cas_Table_BloomFilterFalseRatio`	Bloom filter false positive ratio	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`
`cas_Table_BloomFilterDiskSpaceUsed`	Disk space used by bloom filters	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`

Memory Metrics¶

Dashboard Metric	Description	Attributes
`cas_Table_AllMemtablesHeapSize`	Heap memory used by memtables	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`
`cas_Table_AllMemtablesOffHeapSize`	Off-heap memory used by memtables	`keyspace`, `scope` (table), `dc`, `rack`, `host_id`

JVM Garbage Collection Metrics¶

Dashboard Metric	Description	Attributes
`jvm_GarbageCollector_G1_Young_Generation`	G1 young generation GC	`function` (CollectionTime/CollectionCount), `axonfunction` (rate), `dc`, `rack`, `host_id`
`jvm_GarbageCollector_Shenandoah_Cycles`	Shenandoah GC cycles	`function` (CollectionCount), `axonfunction` (rate), `dc`, `rack`, `host_id`
`jvm_GarbageCollector_Shenandoah_Pauses`	Shenandoah GC pauses	`function` (CollectionTime), `axonfunction` (rate), `dc`, `rack`, `host_id`
`jvm_GarbageCollector_ZGC`	ZGC garbage collector	`function` (CollectionTime/CollectionCount), `axonfunction` (rate), `dc`, `rack`, `host_id`

Query Examples¶

Tables Overview Section¶

// Table Data Size Distribution (Pie Chart)
sum by (keyspace,scope) (cas_Table_LiveDiskSpaceUsed{function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'})

// Coordinator Table Reads Distribution (Pie Chart)
sum by (keyspace,scope) (cas_Table_CoordinatorReadLatency{axonfunction='rate',dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function='Count'})

// Coordinator Table Writes Distribution (Pie Chart)
sum by (keyspace, scope) (cas_Table_CoordinatorWriteLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function=~'Count'})

Coordinator Table Statistics¶

// Max Coordinator Read Latency
max(cas_Table_CoordinatorReadLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function='$percentile',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Max Coordinator Write Latency
max(cas_Table_CoordinatorWriteLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function='$percentile',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Max Coordinator Range Read Latency
max(cas_Table_CoordinatorScanLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',function=~'$percentile',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Total Coordinator Reads/sec
sum (cas_Table_CoordinatorReadLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function=~'Count',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Average Coordinator Range Reads/sec
avg(cas_Table_CoordinatorScanLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function=~'Count',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

// Total Coordinator Writes/sec
sum (cas_Table_CoordinatorWriteLatency{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',axonfunction='rate',function='Count',keyspace=~'$keyspace',scope=~'$scope'}) by ($groupBy,keyspace,scope)

Table Replica Statistics¶

// Average Replica Read Latency
avg(cas_Table_ReadLatency{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Average Replica Range Read Latency
avg(cas_Table_RangeLatency{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Average Replica Write Latency
avg(cas_Table_WriteLatency{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Total Replica Reads/sec
sum(cas_Table_ReadLatency{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Total Replica Range Reads/sec
sum(cas_Table_RangeLatency{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

// Total Replica Writes/sec
sum(cas_Table_WriteLatency{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}) by ($groupBy,keyspace,scope)

Latency Causes¶

// Average Mean Partition Size
avg(cas_Table_MeanPartitionSize{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',scope=~'$scope',scope!=''}) by ($groupBy,keyspace,scope)

// Total Estimated Partitions
sum(cas_Table_EstimatedPartitionCount{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

// Max Partition Size
max(cas_Table_MaxPartitionSize{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',scope=~'$scope',scope!=''}) by ($groupBy,keyspace,scope)

// SSTables Per Read
cas_Table_SSTablesPerReadHistogram{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Max Live SSTables
max(cas_Table_LiveSSTableCount{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',scope=~'$scope',scope!=''}) by ($groupBy,keyspace,scope)

// Tombstones Scanned
cas_Table_TombstoneScannedHistogram{scope=~'$scope',scope!='',keyspace=~'$keyspace',function='$percentile',function!='Count|Min|Max',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Speculative Retries
cas_Table_SpeculativeRetries{axonfunction='rate',scope=~'$scope',scope!='',keyspace=~'$keyspace',function='Count',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Bloom Filter False Ratio
cas_Table_BloomFilterFalseRatio{scope=~'$scope',scope!='',keyspace=~'$keyspace',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// Max Bloom Filter Disk
max(cas_Table_BloomFilterDiskSpaceUsed{dc=~'$dc',rack=~'$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

Memory Statistics¶

// Total Table Heap Memory
sum(cas_Table_AllMemtablesHeapSize{dc=~'$dc',rack='$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

// Total Table Off-Heap Memory
sum(cas_Table_AllMemtablesOffHeapSize{dc=~'$dc',rack='$rack',host_id=~'$host_id',keyspace='$keyspace',scope='$scope'}) by ($groupBy,keyspace,scope)

// GC Duration - G1 YoungGen
jvm_GarbageCollector_G1_Young_Generation{axonfunction='rate',function='CollectionTime',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

// GC Count per sec - G1 YoungGen
jvm_GarbageCollector_G1_Young_Generation{axonfunction='rate',function='CollectionCount',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

Panel Organization¶

Tables Overview¶

Table Data Size % Distribution - Pie chart showing relative data size per table
Coordinator Table Reads % Distribution - Read request distribution by table
Coordinator Table Writes % Distribution - Write request distribution by table

Coordinator Table Statistics¶

Max Coordinator Table Read Latency - Maximum read latency at coordinator
Max Coordinator Table Write Latency - Maximum write latency at coordinator
Max Coordinator Table Range Read Latency - Maximum range query latency
Total Coordinator Table Reads/Sec - Read throughput at coordinator
Average Coordinator Table Range Reads/Sec - Range query throughput
Total Coordinator Table Writes/Sec - Write throughput at coordinator

Table Replica Statistics¶

Average Replica Read Latency - Local read latency
Average Replica Range Read Latency - Local range query latency
Average Replica Write Latency - Local write latency
Total Replica Reads/sec - Local read throughput
Total Replica Table Range Reads/sec - Local range query throughput
Total Replica Writes/sec - Local write throughput

Latency Causes¶

Average Mean Table Partition Size - Average partition size indicator
Total Estimated Table Partitions Count - Partition count per table
Max Table Partition Size - Largest partition (hotspot indicator)
SSTables Per Read - SSTable access efficiency
Max Live SSTables per Table - SSTable count (compaction indicator)
Tombstones Scanned per Table - Tombstone impact on reads
SpeculativeRetries By Node For Table Reads - Retry attempts
Bloom Filter False Positive Ratio - Filter efficiency
Max Table Bloom Filter Disk - Bloom filter storage

Memory Statistics¶

Total Table Heap Memory - Memtable heap usage
Total Table Off-Heap Memory - Memtable off-heap usage
GC duration - G1 YoungGen - Young generation GC time
GC count per sec - G1 YoungGen - Young generation GC frequency
GC duration - Shenandoah - Shenandoah GC time
GC Count per sec - Shenandoah - Shenandoah GC frequency
GC duration - ZGC - ZGC time
GC Count per sec - ZGC - ZGC frequency

Filters¶

data center (dc) - Filter by data center
rack - Filter by rack
node (host_id) - Filter by specific node
groupBy - Dynamic grouping (dc, rack, host_id)
percentile - Select latency percentile (50th, 75th, 95th, 98th, 99th, 999th)
keyspace - Filter by keyspace
table (scope) - Filter by table

Understanding Table Metrics¶

Coordinator vs Replica Metrics¶

Coordinator: Metrics from the node coordinating the request
Replica: Metrics from nodes storing the data
Coordinator latency includes network and replica time
Replica latency is local operation time only

Performance Indicators¶

Partition Size:

Large partitions (>100MB) cause performance issues
Monitor max size for hotspot detection
Mean size indicates data distribution

SSTable Metrics:

High SSTable count impacts read performance
More SSTables = more files to check
Indicates compaction strategy effectiveness

Tombstones:

Deleted data markers
High tombstone counts slow reads
Indicates need for compaction or TTL review

Bloom Filters:

Probabilistic data structure for SSTable lookups
False positive ratio should be <0.1
Higher ratios mean unnecessary SSTable reads

Speculative Retries:

Proactive retry mechanism for slow reads
High rates indicate inconsistent performance
May need tuning or investigation

Best Practices¶

Table Design¶

Partition Size: Keep <100MB, ideally <10MB
Even Distribution: Avoid hotspots
Appropriate TTL: Manage tombstone creation
Compression: Choose based on workload

Performance Monitoring¶

Latency Percentiles:

p50: Median performance
p99: Tail latency
Large p50-p99 gap indicates issues

Throughput Balance:

Even distribution across tables
Identify heavy tables
Plan capacity accordingly

Resource Usage:

Monitor memory per table
Track GC impact
Balance heap/off-heap usage

Troubleshooting¶

High Latency:

Check partition sizes
Review SSTable counts
Monitor tombstones
Verify bloom filter efficiency

Memory Issues:

Check memtable sizes
Monitor GC frequency
Review flush thresholds

Throughput Problems:

Analyze coordinator distribution
Check speculative retries
Review consistency levels

Units and Display¶

Latency: microseconds
Throughput: ops/sec (rps/wps)
Size: bytes (binary units)
Ratio: decimal/percentage
Count: short (absolute numbers)
GC: milliseconds/count per sec

Legend Format:

Overview: $keyspace $scope
Details: $groupBy - $keyspace $scope
Node-specific: $dc - $host_id

Notes¶

The scope!='' filter excludes empty table names
function!='Min|Max' excludes extreme values for percentiles
Coordinator metrics show client-facing performance
Replica metrics show storage-layer performance
GC metrics help correlate latency with JVM behavior
Some queries use special operations like diff or specific rack filtering