AxonOps Kafka Connect Overview Dashboard Metrics Mapping¶
Overview¶
The Kafka Connect Overview Dashboard provides comprehensive monitoring of Kafka Connect clusters, including worker health, connector status, task distribution, rebalancing activity, and network metrics. This dashboard helps monitor the overall health and performance of your Kafka Connect deployment.
Metrics Mapping¶
| Dashboard Metric | Description | Attributes |
|---|---|---|
| Worker Metrics | ||
con_connect_worker_metrics_ (function='connectorfailedtaskcount') |
Number of failed tasks | connector={connector} |
con_connect_worker_metrics_ (function='task_count') |
Total number of tasks | - |
con_connect_worker_rebalance_metrics_ (function='rebalancing') |
Rebalancing status (0 or 1) | - |
con_connect_worker_rebalance_metrics_ (function='rebalance_avg_time_ms') |
Average rebalance time | - |
| Coordinator Metrics | ||
con_connect_coordinator_metrics_ (function='assigned_connectors') |
Number of assigned connectors | - |
con_connect_coordinator_metrics_ (function='assigned_tasks') |
Number of assigned tasks | - |
con_connect_coordinator_metrics_ (function='failed_rebalance_total') |
Total failed rebalances | - |
con_connect_coordinator_metrics_ (function='last_heartbeat_seconds_ago') |
Seconds since last heartbeat | - |
con_connect_coordinator_metrics_ (function='join_total') |
Total join operations | - |
con_connect_coordinator_metrics_ (function='sync_rate') |
Group sync rate | - |
con_connect_coordinator_metrics_ (function='rebalance_total') |
Total rebalances | - |
| Connection Metrics | ||
con_connect_metrics_ (function='connection_count') |
Active connection count | - |
con_connect_metrics_ (function='connection_close_total') |
Total connections closed | - |
con_connect_metrics_ (function='connection_creation_total') |
Total connections created | - |
con_connect_metrics_ (function='failed_authentication_total') |
Total authentication failures | - |
| Network Metrics | ||
con_connect_metrics_ (function='request_total') |
Total requests | - |
con_connect_metrics_ (function='response_total') |
Total responses | - |
con_connect_metrics_ (function='incoming_byte_rate') |
Incoming bytes per second | - |
con_connect_metrics_ (function='outgoing_byte_rate') |
Outgoing bytes per second | - |
con_connect_metrics_ (function='network_io_rate') |
Network I/O rate | - |
con_connect_metrics_ (function='iotime_total') |
Total I/O time | - |
con_connect_metrics_ (function='io_waittime_total') |
Total I/O wait time | - |
Query Examples¶
Worker Health¶
// Failed task count
con_connect_worker_metrics_{function="connectorfailedtaskcount",type='kafka', node_type='connect'}
// Total task count
sum(con_connect_worker_metrics_{function='task_count',type='kafka', node_type='connect'})
// Rebalancing status
con_connect_worker_rebalance_metrics_{function="rebalancing",type='kafka', node_type='connect'}
Coordinator Metrics¶
// Assigned connectors per worker
con_connect_coordinator_metrics_{function="assigned_connectors",type='kafka', node_type='connect',client_id=~'$client_id'}
// Assigned tasks per worker
con_connect_coordinator_metrics_{function="assigned_tasks",type='kafka', node_type='connect',client_id=~'$client_id'}
// Failed rebalances
con_connect_coordinator_metrics_{function="failed_rebalance_total",type='kafka', node_type='connect',client_id=~'$client_id'}
// Heartbeat monitoring
con_connect_coordinator_metrics_{function="last_heartbeat_seconds_ago",type='kafka', node_type='connect',client_id=~'$client_id'}
Connection Management¶
// Active connections
con_connect_metrics_{function="connection_count",type='kafka', node_type='connect',client_id=~'$client_id'}
// Connection creation/close rates
con_connect_metrics_{function="connection_creation_total",type='kafka', node_type='connect',client_id=~'$client_id'}
con_connect_metrics_{function="connection_close_total",type='kafka', node_type='connect',client_id=~'$client_id'}
// Authentication failures
con_connect_metrics_{function="failed_authentication_total",type='kafka', node_type='connect'}
Network Performance¶
// Request/Response tracking
con_connect_metrics_{function="request_total",type='kafka', node_type='connect'}
con_connect_metrics_{function="response_total",type='kafka', node_type='connect'}
// Byte rates
con_connect_metrics_{function="incoming_byte_rate",type='kafka', node_type='connect'}
con_connect_metrics_{function="outgoing_byte_rate",type='kafka', node_type='connect'}
// I/O performance
con_connect_metrics_{axonfunction='rate',function="iotime_total",type='kafka', node_type='connect'}
con_connect_metrics_{axonfunction='rate',function="io_waittime_total",type='kafka', node_type='connect'} / 1000
Rebalancing Metrics¶
// Average rebalance time
con_connect_worker_rebalance_metrics_{function="rebalance_avg_time_ms",type='kafka', node_type='connect'}
// Total rebalances
con_connect_coordinator_metrics_{function="rebalance_total",type='kafka', node_type='connect',client_id=~'$client_id'}
// Join operations
con_connect_coordinator_metrics_{function="join_total",type='kafka', node_type='connect',client_id=~'$client_id'}
// Sync rate
con_connect_coordinator_metrics_{function="sync_rate",type='kafka', node_type='connect',client_id=~'$client_id'}
Panel Organization¶
Overview Section
- Empty row for spacing/organization
Overview
- Connector Workers (counter)
- Connectors Rebalancing (counter)
- Connector Tasks Failed (counter)
Coordinator Metrics
- Assigned Connectors
- Assigned Tasks
- Failed Rebalances
- Last Heartbeat (Seconds ago)
- Rebalances
- Joins
- Sync Rate
Connect Metrics
- Connections
- Connection Close
- Connection Creations
- Rebalance average time
- Requests vs Response
- Failed Authentication
- Incoming Byte Rate
- Outgoing Byte Rate
- Network IO Rate
- IO Time
- IO Wait Time
Filters¶
-
host_id: Filter by specific Connect worker node
-
client_id: Filter by specific client ID
Best Practices¶
Worker Health Monitoring
- Monitor failed task count - should be 0
- Track total tasks for capacity planning
- Watch for frequent rebalancing
Task Distribution
- Ensure even distribution of connectors/tasks
- Monitor for workers with no assignments
- Check for imbalanced workloads
Rebalancing Analysis
- Frequent rebalances indicate instability
- High rebalance time impacts availability
- Monitor failed rebalances for issues
Heartbeat Monitoring
- High heartbeat lag indicates worker issues
- Set alerts for heartbeat timeouts
- Correlate with worker failures
Connection Management
- Monitor connection churn rate
- High authentication failures indicate security issues
- Track connection count for capacity
Network Performance
- Monitor byte rates for throughput
- Check request/response balance
- High I/O wait time indicates bottlenecks
Troubleshooting
- Failed tasks: Check connector logs
- Rebalancing issues: Review worker health
- Authentication failures: Verify credentials
- Network issues: Check Kafka broker connectivity