AxonOps Kafka Requests Dashboard Metrics Mapping¶
Overview¶
The Kafka Requests Dashboard provides comprehensive monitoring of request rates, processing times, and message conversions across your Kafka cluster. It helps identify performance bottlenecks, track request patterns, and monitor client compatibility issues.
Metrics Mapping¶
| Dashboard Metric | Description | Attributes |
|---|---|---|
| Request Rate Metrics | ||
kaf_RequestMetrics_RequestsPerSec |
Rate of requests per second by type | request={type} |
kaf_BrokerTopicMetrics_TotalProduceRequestsPerSec |
Produce requests per second per topic | topic={topic} |
kaf_BrokerTopicMetrics_TotalFetchRequestsPerSec |
Fetch requests per second per topic | topic={topic} |
| Request Timing Metrics | ||
kaf_RequestMetrics_TotalTimeMs |
Total time to process requests | request={type} |
kaf_RequestMetrics_RequestQueueTimeMs |
Time requests spend in queue | request={type} |
| Message Conversion Metrics | ||
kaf_BrokerTopicMetrics_FetchMessageConversionsPerSec |
Rate of message conversions during fetch | - |
kaf_BrokerTopicMetrics_ProduceMessageConversionsPerSec |
Rate of message conversions during produce | - |
| Client Metrics | ||
kaf_socket_server_metrics_ (function='connections') |
Client connections by version | listener={listener}, clientSoftwareName={name}, clientSoftwareVersion={version} |
Query Examples¶
Request Rates¶
// Total requests per second per broker
sum(kaf_RequestMetrics_RequestsPerSec{axonfunction='rate',function='Count',rack=~'$rack',host_id=~'$host_id'}) by (host_id)
// Produce requests per second
sum(kaf_RequestMetrics_RequestsPerSec{axonfunction='rate',function='Count',request='Produce',rack=~'$rack',host_id=~'$host_id'}) by (host_id)
// Fetch consumer requests per second
sum(kaf_RequestMetrics_RequestsPerSec{axonfunction='rate',function='Count',request='FetchConsumer',rack=~'$rack',host_id=~'$host_id'}) by (host_id)
// Metadata requests per second
sum(kaf_RequestMetrics_RequestsPerSec{axonfunction='rate',function='Count',request='Metadata',rack=~'$rack',host_id=~'$host_id'}) by (host_id)
Topic-Level Request Rates¶
// Produce requests per topic
sum(kaf_BrokerTopicMetrics_TotalProduceRequestsPerSec{axonfunction='rate',function='Count',rack=~'$rack',host_id=~'$host_id', topic!=''}) by (topic)
// Fetch requests per topic
sum(kaf_BrokerTopicMetrics_TotalFetchRequestsPerSec{axonfunction='rate',function='Count',rack=~'$rack',host_id=~'$host_id',topic=~'$topic', topic!=''}) by (topic)
Request Processing Times¶
// Produce request total time
kaf_RequestMetrics_TotalTimeMs{request='Produce',function=~'$percentile',rack=~'$rack',host_id=~'$host_id'}
// Fetch request total time
kaf_RequestMetrics_TotalTimeMs{request='Fetch',function=~'$percentile',rack=~'$rack',host_id=~'$host_id'}
// Fetch follower request total time
kaf_RequestMetrics_TotalTimeMs{request='FetchFollower',function=~'$percentile',rack=~'$rack',host_id=~'$host_id'}
Request Queue Times¶
// Fetch request queue time
kaf_RequestMetrics_RequestQueueTimeMs{request='Fetch',function=~'$percentile',rack=~'$rack',host_id=~'$host_id'}
// Fetch follower request queue time
kaf_RequestMetrics_RequestQueueTimeMs{request='FetchFollower',function=~'$percentile',rack=~'$rack',host_id=~'$host_id'}
Message Conversions¶
// Fetch message conversions per second
sum(kaf_BrokerTopicMetrics_FetchMessageConversionsPerSec{axonfunction='rate',rack=~'$rack',host_id=~'$host_id'})
// Produce message conversions per second
kaf_BrokerTopicMetrics_ProduceMessageConversionsPerSec{function='MeanRate',rack=~'$rack',host_id=~'$host_id'}
// Client version distribution
sum(kaf_socket_server_metrics_{function='connections',rack=~'$rack',host_id=~'$host_id'}) by (clientSoftwareVersion, clientSoftwareName)
Panel Organization¶
Overview Section
- Empty row for spacing/organization
Requests
- Total Request Per Sec
- Metadata Request Per Sec
- Produce Request Per Sec
- Fetch Request Per Sec
- Produce request per sec per topic
- Fetch request per sec per topic
Request Times
- Produce Time
- Fetch Time
- FetchFollower Time
Request Queues
- Request Queue Fetch Follower Requests Time
- Request Queue Fetch Requests Time
Message Conversion
- Number of produced message conversion
- Number of consumed message conversion
- Client version repartition
Filters¶
-
rack: Filter by rack location
-
host_id: Filter by specific host/broker
-
topic: Filter by specific topic(s)
-
percentile: Select percentile for latency metrics (50th, 95th, 99th, etc.)
Best Practices¶
Request Rate Monitoring
- Monitor total request rates for capacity planning
- High metadata request rates may indicate client issues
- Balance request rates across brokers
Request Timing Analysis
- Monitor 99th percentile for worst-case scenarios
- High total time indicates processing bottlenecks
- Compare request types to identify slow operations
Queue Time Monitoring
- High queue times indicate thread pool saturation
- Consider increasing request handler threads
- Queue time should be minimal compared to total time
Message Conversion Impact
- Message conversions impact performance significantly
- High conversion rates suggest client version mismatches
- Update clients to match broker message format version
Client Version Management
- Monitor client version distribution
- Identify and upgrade outdated clients
- Ensure compatibility with broker version
Performance Tuning
- Adjust
num.network.threadsfor high request rates - Tune
num.io.threadsfor I/O operations - Monitor and adjust
queued.max.requests
Troubleshooting
- High produce times: Check replication settings
- High fetch times: Review consumer configurations
- Message conversions: Align client/broker versions