AxonOps Kafka Connect Tasks Dashboard Metrics Mapping¶
Overview¶
The Kafka Connect Tasks Dashboard provides detailed monitoring of individual connector tasks, including task performance, error tracking, and sink-specific metrics. This dashboard helps identify task-level issues and optimize connector performance.
Metrics Mapping¶
| Dashboard Metric | Description | Attributes |
|---|---|---|
| Task Performance Metrics | ||
con_connector_task_metrics_ (function='running_ratio') |
Ratio of time task is running vs paused | connector={connector}, task={task} |
con_connector_task_metrics_ (function='batch_size_avg') |
Average batch size processed | connector={connector}, task={task} |
con_connector_task_metrics_ (function='offset_commit_success_percentage') |
Percentage of successful offset commits | connector={connector}, task={task} |
con_connector_task_metrics_ (function='offset_commit_avg_time_ms') |
Average time for offset commits | connector={connector}, task={task} |
con_connector_task_metrics_ (function='offset_commit_max_time_ms') |
Maximum time for offset commits | connector={connector}, task={task} |
| Task Error Metrics | ||
con_task_error_metrics_ (function='deadletterqueue_produce_failures') |
Failed attempts to produce to DLQ | connector={connector}, task={task} |
con_task_error_metrics_ (function='total_record_errors') |
Total number of record-level errors | connector={connector}, task={task} |
con_task_error_metrics_ (function='total_record_failures') |
Total number of record failures | connector={connector}, task={task} |
con_task_error_metrics_ (function='total_records_skipped') |
Total number of skipped records | connector={connector}, task={task} |
con_task_error_metrics_ (function='total_retries') |
Total number of retry attempts | connector={connector}, task={task} |
| Sink Task Metrics | ||
con_sink_task_metrics_ (function='partition_count') |
Number of partitions assigned to task | connector={connector}, task={task} |
con_sink_task_metrics_ (function='sink_record_read_total') |
Total records read from Kafka | connector={connector}, task={task} |
con_sink_task_metrics_ (function='sink_record_active_count') |
Number of records being processed | connector={connector}, task={task} |
con_sink_task_metrics_ (function='sink_record_send_total') |
Total records sent to sink | connector={connector}, task={task} |
Query Examples¶
Task Performance¶
// Running ratio per task
sum(con_connector_task_metrics_{function="running_ratio",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
// Average batch size
sum(con_connector_task_metrics_{function="batch_size_avg",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
// Offset commit success rate
sum(con_connector_task_metrics_{function="offset_commit_success_percentage",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task) * 100
Offset Commit Times¶
// Average commit time
sum(con_connector_task_metrics_{function="offset_commit_avg_time_ms",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
// Maximum commit time
sum(con_connector_task_metrics_{function="offset_commit_max_time_ms",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
Error Tracking¶
// DLQ produce failures
sum(con_task_error_metrics_{function="deadletterqueue_produce_failures",type='kafka', node_type='connect', connector='$connector', task='$task'})
// Total record errors
sum(con_task_error_metrics_{function="total_record_errors",type='kafka', node_type='connect'})
// Total record failures
sum(con_task_error_metrics_{function="total_record_failures",type='kafka', node_type='connect'})
// Records skipped
sum(con_task_error_metrics_{function="total_records_skipped",type='kafka', node_type='connect'})
// Total retries
sum(con_task_error_metrics_{function="total_retries",type='kafka', node_type='connect'})
Sink Task Metrics¶
// Partition count per sink task
sum(con_sink_task_metrics_{function="partition_count",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
// Records read rate
sum(con_sink_task_metrics_{axonfunction="rate",function="sink_record_read_total",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
// Active record count
sum(con_sink_task_metrics_{function="sink_record_active_count",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
// Records sent rate
sum(con_sink_task_metrics_{axonfunction="rate", function="sink_record_send_total",type='kafka', node_type='connect', connector='$connector', task='$task'}) by (connector,task)
Panel Organization¶
Overview Section
- Empty row for spacing/organization
Tasks Metrics
- Connector Tasks Batch Size
- Connector Task Running Ratio
- Connector Task Commit Success %
- Connector Task Commit Avg vs Max time
Task Error Metrics
- Deadletter Produce Failures (duplicate panels)
- Record Errors
- Record Failures
- Record Skipped
- Total Retries
Sink Task Metrics
- Sink Task Record Active Count
- Sink Task Record Read
- Sink Task Partition Count
- Sink Task Record Send
Filters¶
-
host_id: Filter by specific Connect worker node
-
connector: Filter by specific connector name
-
task: Filter by specific task ID
Best Practices¶
Task Performance Monitoring
- Running ratio should be close to 1.0 for active tasks
- Monitor batch sizes for throughput optimization
- Low commit success rate indicates processing issues
Offset Commit Analysis
- High commit times indicate performance issues
- Compare average vs max times for outliers
- Frequent commit failures suggest configuration issues
Error Management
- Monitor DLQ failures for error handling issues
- Track record errors vs failures vs skipped
- High retry counts indicate transient issues
Sink Task Optimization
- Balance partition assignment across tasks
- Monitor active record count for backpressure
- Compare read vs send rates for processing lag
Troubleshooting
- Low running ratio: Check for task pauses/failures
- High error rates: Review connector configuration
- DLQ failures: Check DLQ topic permissions
- Commit failures: Verify offset storage configuration
Performance Tuning
- Adjust batch sizes for optimal throughput
- Tune commit intervals based on latency requirements
- Configure appropriate retry policies
- Monitor partition assignment balance
Capacity Planning
- Track record processing rates
- Monitor active record counts for memory usage
- Plan task scaling based on partition count