Skip to content

AxonOps Kafka Connect Workers Dashboard Metrics Mapping

Overview

The Kafka Connect Workers Dashboard provides comprehensive monitoring of individual Connect workers, tracking connector and task lifecycle, startup/shutdown events, and rebalancing activities. This dashboard helps monitor worker health and identify issues with connector deployment and task management.

Metrics Mapping

Dashboard Metric Description Attributes
Worker Overview Metrics
con_connect_worker_metrics_ (function='task_count') Total number of tasks on worker -
con_connect_worker_metrics_ (function='connector_count') Total number of connectors on worker -
Connector Lifecycle Metrics
con_connect_worker_metrics_ (function='connector_failed_task_count') Failed tasks per connector connector={connector}
con_connect_worker_metrics_ (function='connector_startup_attempts_total') Total connector startup attempts -
con_connect_worker_metrics_ (function='connector_startup_failure_total') Failed connector startup attempts -
con_connect_worker_metrics_ (function='connector_startup_success_total') Successful connector startup attempts -
con_connect_worker_metrics_ (function='connector_total_task_count') Total tasks per connector connector={connector}
Task State Metrics
con_connect_worker_metrics_ (function='connector_paused_task_count') Paused tasks per connector connector={connector}
con_connect_worker_metrics_ (function='connector_destroyed_task_count') Destroyed tasks per connector connector={connector}
con_connect_worker_metrics_ (function='connector_running_task_count') Running tasks per connector connector={connector}
Task Lifecycle Metrics
con_connect_worker_metrics_ (function='task_startup_attempts_total') Total task startup attempts -
con_connect_worker_metrics_ (function='task_startup_failure_total') Failed task startup attempts -
con_connect_worker_metrics_ (function='task_startup_success_total') Successful task startup attempts -
Rebalance Metrics
con_connect_worker_rebalance_metrics_ (function='rebalance_avg_time_ms') Average rebalance time -
con_connect_worker_rebalance_metrics_ (function='completed_rebalances_total') Total completed rebalances -

Query Examples

Worker Overview

// Total task count
con_connect_worker_metrics_{function="task_count",type='kafka', node_type='connect'}

// Total connector count
con_connect_worker_metrics_{function="connector_count",type='kafka', node_type='connect'}

Connector Lifecycle

// Connector count by host
sum(con_connect_worker_metrics_{function="connector_count",type='kafka', node_type='connect'}) by (host_id)

// Failed tasks by connector
con_connect_worker_metrics_{function="connector_failed_task_count",type='kafka', node_type='connect'}

// Connector startup attempts (total, failed, successful)
con_connect_worker_metrics_{function='connector_startup_attempts_total',type='kafka', node_type='connect'}
con_connect_worker_metrics_{function='connector_startup_failure_total',type='kafka', node_type='connect'}
con_connect_worker_metrics_{function='connector_startup_success_total',type='kafka', node_type='connect'}

Task States

// Running tasks by connector
sum(con_connect_worker_metrics_{function="connector_running_task_count",type='kafka', node_type='connect', connector='$connector'}) by (connector)

// Paused tasks by connector
sum(con_connect_worker_metrics_{function="connector_paused_task_count",type='kafka', node_type='connect', connector='$connector'}) by (connector)

// Failed tasks by connector
sum(con_connect_worker_metrics_{function="connector_failed_task_count",type='kafka', node_type='connect', connector='$connector'}) by (connector)

// Destroyed tasks by connector
sum(con_connect_worker_metrics_{function="connector_destroyed_task_count",type='kafka', node_type='connect', connector='$connector'}) by (connector)

Task Lifecycle

// Task startup attempts (total, failed, successful)
con_connect_worker_metrics_{function='task_startup_attempts_total',type='kafka', node_type='connect'}
con_connect_worker_metrics_{function='task_startup_failure_total',type='kafka', node_type='connect'}
con_connect_worker_metrics_{function='task_startup_success_total',type='kafka', node_type='connect'}

// Total tasks per connector
con_connect_worker_metrics_{function="connector_total_task_count",type='kafka', node_type='connect', connector='$connector'}

Rebalancing

// Average rebalance time
con_connect_worker_rebalance_metrics_{function="rebalance_avg_time_ms",type='kafka', node_type='connect'}

// Completed rebalances
con_connect_worker_rebalance_metrics_{function="completed_rebalances_total",type='kafka', node_type='connect'}

Panel Organization

Overview Section

  • Empty row for spacing/organization

Workers

  • Connector Count (counter)
  • Task Count (counter)

Worker Metrics

  • Connector Count (time series)
  • Connector Startup
  • Connector Failed
  • Connector Task Count
  • Connector Task Startup by Host
  • Connector Paused Tasks
  • Connector Destroyed

Worker Tasks

  • Connector Running Tasks
  • Connector Failed Tasks
  • Connector Destroyed Tasks

Rebalance Metrics

  • Connector Rebalances (duplicate panels)
  • Connectors Avg Rebalance Time

Filters

  • host_id: Filter by specific Connect worker node

  • connector: Filter by specific connector name

Best Practices

Worker Health Monitoring

  • Monitor task and connector counts per worker
  • Ensure balanced distribution across workers
  • Track failed task counts for issues

Connector Lifecycle

  • Monitor startup success vs failure rates
  • High failure rates indicate configuration issues
  • Track connector count changes over time

Task State Management

  • Running tasks should match expected count
  • Paused tasks may indicate manual intervention
  • Failed tasks require investigation
  • Destroyed tasks indicate connector removal

Startup Monitoring

  • Compare startup attempts vs successes
  • High failure rates suggest configuration problems
  • Monitor both connector and task startups

Rebalancing Analysis

  • Frequent rebalances impact availability
  • High rebalance times affect task availability
  • Monitor after adding/removing workers

Troubleshooting

  • Failed connectors: Check logs and configuration
  • Paused tasks: Verify intentional vs error state
  • Startup failures: Review connector configs
  • Destroyed tasks: Confirm planned removals

Capacity Planning

  • Monitor task distribution across workers
  • Plan worker scaling based on task counts
  • Balance connectors for even resource usage