Skip to content

Kafka Producer Error Handling

Proper error handling ensures reliable message delivery. This document covers error types, timeout configuration, and recovery patterns.


Error Classification

uml diagram

Type Producer Behavior Application Action
Retriable Automatic retry Wait or handle timeout
Non-Retriable Fail immediately Fix data or config
Fatal Cannot recover Close producer

Timeout Configuration

# Total time to deliver message
delivery.timeout.ms=120000

# Per-request timeout
request.timeout.ms=30000

# Retry backoff
retry.backoff.ms=100
retry.backoff.max.ms=1000

uml diagram


Handling Patterns

Callback-Based

producer.send(record, (metadata, exception) -> {
    if (exception == null) {
        log.info("Sent to {} @ {}", metadata.partition(), metadata.offset());
        return;
    }

    if (exception instanceof RetriableException) {
        // Already retried and failed
        log.error("Retriable error exhausted", exception);
        deadLetterQueue.send(record);
    } else if (exception instanceof SerializationException) {
        // Bad data
        log.error("Serialization failed", exception);
        metrics.increment("serialization_errors");
    } else {
        // Other error
        log.error("Send failed", exception);
        alertService.notify(exception);
    }
});

Synchronous with Retry

int maxRetries = 3;
for (int attempt = 0; attempt < maxRetries; attempt++) {
    try {
        RecordMetadata metadata = producer.send(record).get();
        log.info("Sent to {} @ {}", metadata.partition(), metadata.offset());
        break;
    } catch (ExecutionException e) {
        Throwable cause = e.getCause();
        if (cause instanceof RetriableException && attempt < maxRetries - 1) {
            log.warn("Retry {} of {}", attempt + 1, maxRetries);
            Thread.sleep(100 * (attempt + 1));
        } else {
            throw e;
        }
    }
}

Common Errors

Error Cause Solution
TimeoutException Broker slow or unreachable Check broker health, increase timeout
NotEnoughReplicasException ISR too small Wait for replicas to catch up
RecordTooLargeException Message > max.request.size Increase limit or reduce message size
SerializationException Serializer failure Fix serialization logic
ProducerFencedException Another producer with same txn.id Ensure single instance

Dead Letter Queue Pattern

public void sendWithDLQ(ProducerRecord<String, String> record) {
    producer.send(record, (metadata, exception) -> {
        if (exception != null) {
            // Send to DLQ
            ProducerRecord<String, String> dlqRecord = new ProducerRecord<>(
                "dead-letter-queue",
                record.key(),
                record.value()
            );
            dlqRecord.headers()
                .add("original-topic", record.topic().getBytes())
                .add("error", exception.getMessage().getBytes());

            dlqProducer.send(dlqRecord);
        }
    });
}

Best Practices

Practice Rationale
Always handle callbacks Detect delivery failures
Use appropriate timeouts Balance reliability and latency
Log errors with context Enable debugging
Implement DLQ for critical data Prevent data loss
Monitor error rates Detect systemic issues