Kafka Architecture¶

Apache Kafka is a distributed commit log designed for high-throughput, fault-tolerant, real-time data streaming.

Architectural Overview¶

Kafka's architecture consists of brokers forming a cluster, with data organized into topics and partitions. Producers write to topic partitions, and consumers read from them.

Core Components¶

Component	Description
Broker	Server that stores data and serves client requests
Controller	Broker responsible for cluster coordination (leader election, partition assignment)
Topic	Named category of records; logical grouping of related events
Partition	Ordered, immutable sequence of records within a topic
Replica	Copy of a partition for fault tolerance
Producer	Client that publishes records to topics
Consumer	Client that subscribes to topics and processes records
Consumer Group	Set of consumers that coordinate to consume a topic

Brokers¶

Brokers are the servers that form a Kafka cluster. Each broker:

Stores partition data on disk
Handles produce and fetch requests from clients
Replicates data to other brokers
Participates in cluster coordination

Broker Architecture¶

Key Broker Configurations¶

Configuration	Default	Description
`broker.id`	-1 (auto)	Unique identifier for this broker
`log.dirs`	/tmp/kafka-logs	Directories for partition data
`num.network.threads`	3	Threads for network I/O
`num.io.threads`	8	Threads for disk I/O
`socket.send.buffer.bytes`	102400	Socket send buffer
`socket.receive.buffer.bytes`	102400	Socket receive buffer
`socket.request.max.bytes`	104857600	Maximum request size (100MB)

→ Broker Deep Dive

Controller¶

The controller is a broker elected to manage cluster-wide operations. In KRaft mode, a quorum of controllers handles this; in ZooKeeper mode, a single controller is elected.

Controller Responsibilities¶

Responsibility	Description
Leader election	Elect new partition leaders when leaders fail
Partition assignment	Assign partitions to brokers
Broker registration	Track broker membership in cluster
Topic management	Create and delete topics
Metadata propagation	Distribute cluster metadata to all brokers

KRaft vs ZooKeeper¶

Kafka is transitioning from ZooKeeper to KRaft (Kafka Raft) for cluster coordination.

Aspect	ZooKeeper Mode	KRaft Mode
External dependency	ZooKeeper cluster required	None
Metadata storage	Split (ZK + broker logs)	Unified (__cluster_metadata topic)
Failover time	Seconds to minutes	Milliseconds to seconds
Partition limit	~200K practical limit	Millions of partitions
Operational complexity	Two systems to manage	Single system
Version	All versions	Kafka 3.3+ (production ready)

→ KRaft Guide

Partitions and Replication¶

Partitions¶

A topic is divided into partitions—ordered, append-only logs. Partitions enable:

Parallelism: Multiple consumers can read different partitions concurrently
Ordering: Records within a partition maintain strict order
Scalability: Partitions can be distributed across brokers

Replication¶

Each partition has multiple replicas distributed across brokers. One replica is the leader; others are followers.

Replication Concepts¶

Concept	Description
Leader	Replica that handles all reads and writes for a partition
Follower	Replica that replicates from leader; can become leader if current leader fails
ISR (In-Sync Replicas)	Set of replicas that are fully caught up with leader
High Watermark	Offset up to which all ISR have replicated; consumers can only read up to HW
LEO (Log End Offset)	Latest offset in the leader's log

Replication Factor¶

The replication factor determines how many copies of each partition exist:

RF	Behavior	Use Case
1	No redundancy; data loss on broker failure	Development only
2	Survives 1 broker failure	Limited production use
3	Survives 2 broker failures; quorum possible	Production standard
4+	Higher durability; rarely needed	Critical data

→ Replication Deep Dive

Storage Engine¶

Kafka stores data in log segments on disk. The storage design prioritizes sequential I/O for maximum throughput.

Log Segment Files¶

File	Purpose
`.log`	Message data (key, value, headers, metadata)
`.index`	Offset-to-position index for efficient seeking
`.timeindex`	Timestamp-to-offset index for time-based seeking
`.txnindex`	Transaction index (for transactional messages)
`.snapshot`	Producer state snapshots

Retention Policies¶

Policy	Configuration	Behavior
Time-based	`retention.ms`	Delete segments older than threshold
Size-based	`retention.bytes`	Delete oldest segments when partition exceeds size
Compaction	`cleanup.policy=compact`	Keep only latest value per key

→ Storage Engine Deep Dive

Performance Architecture¶

Kafka achieves high throughput through several design choices:

Sequential I/O¶

Zero-Copy Transfers¶

Kafka uses sendfile() to transfer data directly from disk to network, bypassing user-space copies.

TLS Disables Zero-Copy

When TLS encryption is enabled, zero-copy is not possible because data must be encrypted in user space. This can reduce throughput by 30-50% depending on CPU and workload.

Batching¶

Producers batch messages before sending, and consumers fetch in batches:

Batching Point	Configuration	Benefit
Producer	`batch.size`, `linger.ms`	Amortize network overhead, enable compression
Broker	Internal batching	Efficient disk writes
Consumer	`fetch.min.bytes`, `fetch.max.wait.ms`	Reduce fetch requests

OS Page Cache¶

Kafka relies on the OS page cache rather than managing its own cache:

Benefit	Description
Automatic management	OS handles cache eviction
Warm restarts	Cache survives broker restarts
Memory efficiency	Avoids double-buffering in JVM
Read-ahead	OS prefetches sequential reads

→ Performance Internals

Fault Tolerance¶

Kafka survives failures at multiple levels:

Failure	Kafka Response
Single broker	Leader election; ISR continues serving
Multiple brokers	Service continues if enough replicas remain
Rack failure	Rack-aware placement ensures cross-rack replicas
Network partition	min.insync.replicas prevents split-brain
Disk failure	Affected partitions relocate to healthy disks

Data Durability Configuration¶

Setting	Value	Behavior
`acks=all`	Producer waits for all ISR	Strongest durability
`min.insync.replicas=2`	Require 2 replicas for writes	Prevents single-replica writes
`unclean.leader.election.enable=false`	Only ISR can become leader	Prevents data loss on failover

→ Fault Tolerance Guide

Topics and Partitions - Partitions, leaders, ISR, replication
Transaction Coordinator - Exactly-once semantics, two-phase commit
Brokers - Broker architecture and configuration
KRaft - KRaft consensus mode
Replication - ISR, leader election, durability
Storage Engine - Log segments, indexes, compaction
Performance Internals - Zero-copy, batching, tuning
Fault Tolerance - Failure scenarios and recovery
Topology - Rack awareness, network design