Shock Absorber Pattern¶

The shock absorber pattern uses Kafka as a buffer between systems with different throughput capacities. Kafka absorbs traffic spikes, allowing downstream systems to process at their own sustainable rate without being overwhelmed.

Key Benefit: Right-Sized Infrastructure¶

Backend systems—databases, mainframes, third-party APIs—do not need to be provisioned for peak capacity. They only need to handle the average throughput, with Kafka absorbing the difference during spikes.

Component	Without Kafka	With Kafka
Database	Provisioned for peak load	Provisioned for average load
Application servers	Auto-scale to handle spikes	Fixed pool at steady capacity
Downstream APIs	Rate limiting / request rejection	Steady, predictable load

This translates directly to:

Lower infrastructure costs — Smaller database instances, fewer application servers
Predictable performance — No degradation during traffic spikes
Simplified capacity planning — Plan for average, not peak
Reduced operational complexity — No auto-scaling policies to tune

Trade-offs and Suitability¶

The shock absorber pattern introduces latency between when data is produced and when it reaches the backend system. During traffic spikes, this delay can grow significantly as the buffer fills.

Spike Duration Matters¶

The pattern works best for short-lived spikes where the buffer can drain during quieter periods:

Spike Pattern	Suitability	Reason
Flash sales (minutes to hours)	Excellent	Buffer drains overnight
Daily peaks (predictable hours)	Excellent	Off-peak hours allow catch-up
Sustained high load (days/weeks)	Poor	Buffer never drains, lag grows indefinitely
Gradual permanent increase	Poor	Requires capacity increase, not buffering

Before adopting this pattern, assess whether traffic spikes are temporary or represent a sustained increase in load. If the latter, the solution is to scale the backend, not buffer indefinitely.

Suitable Use Cases¶

Systems where delayed updates are acceptable:

Analytics and reporting pipelines
Search index updates
Data warehouse loading
Notification delivery
Audit log processing
Cache warming

Not Suitable For¶

Real-time transaction processing requiring immediate confirmation
Systems where users expect instant visibility of changes
Low-latency trading or bidding systems
Sustained load increases (requires actual scaling)

The Analogy¶

Mechanical: Spring and Damper¶

A car's suspension system provides the perfect analogy. When the wheel hits a speedbump (traffic spike), the spring compresses (Kafka buffers messages) and the damper controls the release rate (consumer processes at steady pace). The car body (backend system) experiences a smooth ride regardless of road conditions.

Signal Processing: Low-Pass Filter¶

In signal processing terms, Kafka acts as a low-pass filter (or IIR filter). High-frequency spikes in the input signal are attenuated, while the underlying trend passes through smoothly to the output.

Input Signal (Producer Traffic):

Output Signal (Consumer Traffic) - After Kafka:

Kafka filters out high-frequency variations. The consumer sees a steady, predictable load while spikes are absorbed into the buffer (consumer lag).

Throughput Over Time¶

The following chart illustrates the smoothing effect in practice:

Time	Producer Rate	Consumer Rate	Lag Trend	Backend Load
00:00	1,000/s	1,000/s	—	1,000/s
06:00	2,000/s	2,000/s	—	2,000/s
12:00	12,000/s	2,000/s	↑↑ Peak	2,000/s
18:00	3,000/s	2,000/s	↓ Draining	2,000/s
24:00	1,000/s	1,000/s	— Empty	1,000/s

Key insight: The area between producer and consumer rates represents buffered messages (consumer lag). This lag grows during spikes and drains during quiet periods. The backend database experiences a constant, manageable load while Kafka absorbs all variability.

The Problem¶

Systems rarely produce and consume data at identical rates:

Common scenarios:

Scenario	Producer Rate	Consumer Capacity	Problem
Flash sales	100x normal	Fixed database IOPS	Database saturation
Log ingestion	Varies with traffic	Fixed Elasticsearch cluster	Index rejections
IoT telemetry	Sensor bursts	Limited analytics pipeline	Processing delays
Batch jobs	Bulk exports	Rate-limited APIs	API throttling
Event-driven	Cascading events	Legacy system limits	System crashes

How Kafka Absorbs Shocks¶

Kafka's architecture naturally supports load leveling:

Key properties:

Durable buffer - Messages persist until consumed (or retention expires)
Decoupled rates - Producers and consumers operate independently
Horizontal scaling - Add partitions for higher throughput
Configurable retention - Buffer size measured in time or bytes

Architecture Patterns¶

Pattern 1: Protecting Legacy Systems¶

Legacy systems (mainframes, older databases) often cannot scale elastically. Kafka shields them from modern traffic patterns.

Implementation:

public class RateLimitedConsumer {

    private final RateLimiter rateLimiter;
    private final LegacyClient legacyClient;
    private final Consumer<String, Event> consumer;

    public RateLimitedConsumer(int maxTps) {
        this.rateLimiter = RateLimiter.create(maxTps);
    }

    public void consume() {
        while (running) {
            ConsumerRecords<String, Event> records = consumer.poll(Duration.ofMillis(100));

            for (ConsumerRecord<String, Event> record : records) {
                // Block until rate limit allows
                rateLimiter.acquire();

                try {
                    legacyClient.process(record.value());
                } catch (RateLimitException e) {
                    // Backend is struggling - pause consumption
                    consumer.pause(consumer.assignment());
                    Thread.sleep(backoffMs);
                    consumer.resume(consumer.assignment());
                }
            }

            consumer.commitSync();
        }
    }
}

Pattern 2: Batch Aggregation¶

Collect events during high-traffic periods, process in efficient batches during low-traffic periods.

Configuration:

# Topic retention sized for batch window
log.retention.hours=48
log.retention.bytes=-1

# Consumer for batch processing
max.poll.records=10000
fetch.max.bytes=52428800
enable.auto.commit=false

Pattern 3: Multi-Speed Consumers¶

Different consumers process the same events at different rates based on their capabilities.

Consumer Lag as a Feature¶

In the shock absorber pattern, consumer lag is not a problem—it's the buffer working as intended.

Sizing the Buffer¶

Calculate required retention based on expected spike patterns:

Buffer Size = (Peak Rate - Consumer Rate) × Spike Duration

Example:
- Peak rate: 10,000 msg/s
- Consumer rate: 2,000 msg/s
- Spike duration: 2 hours

Buffer = (10,000 - 2,000) × 7,200 seconds = 57.6 million messages

With 1KB average message size:
Storage needed = 57.6M × 1KB = ~58 GB per partition

Retention configuration:

# Time-based retention (for predictable spike patterns)
log.retention.hours=72

# Size-based retention (for storage constraints)
log.retention.bytes=107374182400  # 100 GB per partition

# Combined (whichever triggers first)
log.retention.hours=72
log.retention.bytes=107374182400

Lag Alerting Strategy¶

# Prometheus alerting rules for shock absorber pattern

groups:
  - name: kafka-shock-absorber
    rules:
      # Alert on lag growth rate, not absolute lag
      - alert: ConsumerLagGrowingTooFast
        expr: |
          rate(kafka_consumer_group_lag[5m]) > 1000
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Consumer lag growing rapidly"
          description: "Lag increasing by {{ $value }}/s - verify this is expected"

      # Alert if lag approaches retention limit
      - alert: ConsumerLagApproachingRetention
        expr: |
          kafka_consumer_group_lag / kafka_topic_partition_current_offset
          > 0.8
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: "Consumer may lose messages"
          description: "Lag at 80% of available buffer - messages at risk"

      # Alert if lag not draining during off-peak
      - alert: ConsumerLagNotDraining
        expr: |
          kafka_consumer_group_lag > 1000000
          and hour() >= 2 and hour() <= 6
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: "Lag not draining during off-peak"
          description: "Expected lag to decrease overnight"

Backpressure Strategies¶

When consumers cannot keep up even during normal periods, implement backpressure.

Strategy 1: Pause/Resume¶

public class BackpressureConsumer {

    private static final long LAG_THRESHOLD = 1_000_000;
    private static final long RESUME_THRESHOLD = 100_000;

    private final AdminClient adminClient;
    private boolean paused = false;

    public void consumeWithBackpressure() {
        while (running) {
            long currentLag = getCurrentLag();

            if (!paused && currentLag > LAG_THRESHOLD) {
                // Too far behind - pause to let downstream recover
                consumer.pause(consumer.assignment());
                paused = true;
                notifyUpstream("SLOW_DOWN");
                log.warn("Paused consumption - lag {} exceeds threshold", currentLag);
            }

            if (paused && currentLag < RESUME_THRESHOLD) {
                // Caught up enough to resume
                consumer.resume(consumer.assignment());
                paused = false;
                notifyUpstream("READY");
                log.info("Resumed consumption - lag {} below threshold", currentLag);
            }

            if (!paused) {
                ConsumerRecords<String, Event> records = consumer.poll(Duration.ofMillis(100));
                processRecords(records);
            } else {
                // Still paused - sleep to avoid busy loop
                Thread.sleep(1000);
            }
        }
    }
}

Strategy 2: Adaptive Batch Sizing¶

public class AdaptiveBatchConsumer {

    private int currentBatchSize = 1000;
    private static final int MIN_BATCH = 100;
    private static final int MAX_BATCH = 10000;

    public void consumeAdaptively() {
        while (running) {
            ConsumerRecords<String, Event> records = consumer.poll(Duration.ofMillis(100));

            long startTime = System.currentTimeMillis();
            int processed = processBatch(records, currentBatchSize);
            long duration = System.currentTimeMillis() - startTime;

            // Adjust batch size based on processing time
            if (duration < 100 && currentBatchSize < MAX_BATCH) {
                // Processing fast - increase batch size
                currentBatchSize = Math.min(currentBatchSize * 2, MAX_BATCH);
            } else if (duration > 1000 && currentBatchSize > MIN_BATCH) {
                // Processing slow - decrease batch size
                currentBatchSize = Math.max(currentBatchSize / 2, MIN_BATCH);
            }

            consumer.commitSync();
        }
    }
}

Strategy 3: Priority Lanes¶

Route high-priority messages to a fast lane that's always processed, while low-priority messages buffer.

Cost Optimization¶

The shock absorber pattern enables significant cost savings by smoothing resource usage.

Without Kafka (Auto-Scaling)¶

With Kafka (Steady Consumers)¶

Cost comparison:

Approach	Compute Cost	Operational Complexity	Risk
Auto-scaling	Higher (peak capacity)	High (scaling policies)	Scale-up latency
Shock absorber	Lower (steady capacity)	Low (fixed pool)	Must size buffer

Anti-Patterns¶

Undersized Buffer¶

Ignoring Consumer Health¶

// WRONG: Assuming consumer keeps up
@KafkaListener(topics = "events")
public void consume(Event event) {
    // No rate limiting
    // No health checks
    // No backpressure
    database.insert(event);  // What if DB is slow?
}

// RIGHT: Health-aware consumption
@KafkaListener(topics = "events")
public void consume(Event event, Acknowledgment ack) {
    if (!healthChecker.isDownstreamHealthy()) {
        // Don't ack - message will be redelivered
        throw new RetryableException("Downstream unhealthy");
    }

    rateLimiter.acquire();

    try {
        database.insert(event);
        ack.acknowledge();
    } catch (Exception e) {
        // Let Kafka retry
        throw new RetryableException(e);
    }
}

No Drain Strategy¶

Summary¶

Aspect	Recommendation
Buffer sizing	Peak rate × spike duration × safety margin
Retention	Time or size based on drain requirements
Consumer rate	Must exceed average producer rate
Lag alerting	Alert on growth rate, not absolute value
Backpressure	Pause/resume, adaptive batching, or priority lanes
Monitoring	Buffer utilization, drain time, rate delta

The shock absorber pattern transforms unpredictable traffic into predictable processing load, enabling cost optimization, protecting legacy systems, and improving overall system resilience.

Microservices - Consumer group strategies
Producer Development - Batching and throughput
Consumer Development - Offset management
Operations - Retention configuration