Skip to content

What is Event Streaming?

Event streaming is a data architecture paradigm where data is captured, stored, and processed as continuous streams of events.


The Event-Driven Paradigm

Traditional application architectures are request-driven: a client sends a request, the server processes it, and returns a response. Data is stored in databases and queried on demand.

Event-driven architectures invert this model: systems emit events as things happen, and other systems react to those events. Events are stored in order and can be replayed, enabling new capabilities that request-response architectures cannot provide.

uml diagram

Events vs Commands

Concept Definition Example
Event Immutable fact that something occurred "OrderPlaced", "PaymentReceived", "UserLoggedIn"
Command Request for something to happen "PlaceOrder", "ProcessPayment", "SendEmail"

Events are named in past tense because they represent facts that have already happened. Commands are named in imperative form because they request future action.

Key distinction: Events cannot be rejected—they describe what has happened. Commands can be accepted or rejected based on business rules.


The Commit Log Model

At its core, Kafka is a distributed commit log: an ordered, append-only sequence of records. This simple data structure provides powerful guarantees.

uml diagram

Commit Log Properties

Property Description
Append-only New records are only added to the end; existing records are never modified
Immutable Once written, records cannot be changed
Ordered Records have a strict sequence (offset)
Durable Records are persisted to disk and replicated
Replayable Readers can start from any position and replay forward

Why Append-Only?

The append-only constraint enables:

  1. Sequential I/O: Writing to the end of a file is orders of magnitude faster than random writes
  2. Easy replication: Followers simply append whatever the leader appends
  3. Consumer independence: Readers track their own position without modifying the log
  4. Point-in-time recovery: Start from any offset to replay history

Comparing Messaging Models

Traditional Message Queues

Traditional message queues (RabbitMQ, ActiveMQ, IBM MQ) implement a push-based model where the broker actively delivers messages to consumers.

uml diagram

Queue characteristics:

Aspect Behavior
Delivery Push-based—broker sends to consumers
Lifetime Message deleted after successful delivery
Consumers Competing—each message goes to one consumer
Ordering Usually FIFO across entire queue
Replay Not possible—messages are deleted

Publish-Subscribe

Traditional pub/sub systems deliver messages to all subscribers of a topic.

uml diagram

Pub/sub characteristics:

Aspect Behavior
Delivery Push-based or pull-based depending on implementation
Lifetime Often transient—no retention
Consumers Independent—all subscribers receive everything
Ordering Often not guaranteed
Replay Limited or not supported

Kafka's Log Model

Kafka combines the best of both models through consumer groups.

uml diagram

Kafka characteristics:

Aspect Behavior
Delivery Pull-based—consumers fetch at their own pace
Lifetime Retained based on time or size policy
Consumers Consumer groups enable both queue and pub/sub semantics
Ordering Guaranteed within partition
Replay Full replay from any retained offset

Pull vs Push Delivery

Kafka uses a pull-based model where consumers request data from brokers, rather than brokers pushing data to consumers.

Advantages of Pull

Advantage Explanation
Consumer-controlled rate Consumers fetch only what they can process, preventing overload
Batching efficiency Consumers can batch requests for efficient network utilization
Simple broker Broker does not track consumer state or manage delivery retries
Replay support Consumers control their position and can re-read data

Push Model Drawbacks

In push-based systems:

  • Broker must track each consumer's state
  • Slow consumers can back up the entire system
  • Broker must implement complex flow control
  • Replay requires special infrastructure

Event Streaming Benefits

Temporal Decoupling

Producers and consumers do not need to be online simultaneously. A consumer can process events hours or days after they were produced.

uml diagram

Replayability

Events can be replayed to:

  • Rebuild materialized views
  • Fix bugs in processing logic and reprocess
  • Bootstrap new consumers with historical data
  • Perform what-if analysis on historical events

Multiple Consumer Patterns

The same topic can serve different use cases simultaneously:

Consumer Group Use Case Processing Pattern
analytics Real-time dashboards Streaming aggregation
search-indexer Search updates Index on every event
data-warehouse Batch analytics Periodic bulk loads
audit-log Compliance Long-term archival

Loose Coupling

Producers and consumers are decoupled:

  • Producers do not know who consumes their events
  • Consumers do not know who produces events
  • New consumers can be added without producer changes
  • Processing logic can be changed without affecting producers

Event Sourcing with Kafka

Event sourcing stores the sequence of events that led to current state, rather than just the current state itself.

uml diagram

Event Sourcing Benefits

Benefit Description
Complete audit trail Every state change is recorded
Temporal queries "What was the state at time T?"
Debugging Replay events to understand what happened
Flexibility Derive new views from historical events

Kafka's log retention and compaction features make it suitable as an event store for event-sourced systems.


Stream Processing Concepts

Event streaming enables stream processing: continuous computation over unbounded data.

Bounded vs Unbounded Data

Data Type Characteristics Processing
Bounded Fixed size, complete Batch processing
Unbounded Infinite, never complete Stream processing

Traditional batch processing operates on bounded datasets. Stream processing operates on unbounded streams, processing events as they arrive.

Windowing

Since streams are unbounded, aggregations require windows—bounded subsets of the stream.

uml diagram

Window Type Description Use Case
Tumbling Fixed size, non-overlapping Hourly aggregations
Hopping Fixed size, overlapping Moving averages
Sliding Fixed size, event-triggered Continuous monitoring
Session Activity-based, variable size User session analytics