Kafka Topics¶

Topics are the fundamental unit of organization in Kafka. The topic architecture directly addresses the core problems Kafka was designed to solve: horizontal scalability through partitioning and high availability through replication.

Design Rationale¶

Traditional message queues impose fundamental constraints: a single queue creates a throughput ceiling, and broker failure causes data loss or unavailability. Kafka's topic architecture eliminates both limitations.

Constraint	Traditional Queue	Kafka Topic
Throughput	Single broker bottleneck	Partitions distribute across brokers
Availability	Single point of failure	Replicas on multiple brokers
Scaling	Vertical only	Horizontal via partition count
Retention	Deleted after consumption	Configurable time/size-based retention

Partitions¶

Partitions are the unit of parallelism in Kafka. A single log cannot scale beyond one broker's capacity—partitioning solves this by splitting a topic into independent segments that can be distributed across the cluster.

The partition count is specified when creating a topic and determines the topic's maximum parallelism. Each partition is an ordered, append-only sequence of records stored on a single broker. By distributing partitions across brokers:

Write throughput scales horizontally: Producers write to different partitions in parallel
Read throughput scales horizontally: Consumers in a group read from different partitions in parallel
Storage scales horizontally: Each broker stores only a subset of the topic's data
Failure isolation improves: A broker failure affects only its partitions, not the entire topic

Behavioral Guarantees¶

Guarantee	Scope	Behavior
Ordering	Within partition	Records must be read in the order written
Ordering	Across partitions	Undefined—no ordering relationship exists between partitions
Immutability	Record	Records must not be modified after being written
Offset assignment	Partition	Offsets must be sequential integers starting from 0
Offset uniqueness	Partition	Each offset must be assigned to exactly one record

Undefined Behavior

The relative ordering of records across different partitions is undefined and must not be relied upon. A consumer reading from multiple partitions may observe records in any interleaving. Applications requiring total ordering must use a single partition.

Partition-Key Contract¶

When a record key is provided, the default partitioner guarantees:

partition = murmur2(keyBytes) & 0x7FFFFFFF % numPartitions

Condition	Guarantee
Same key, same partition count	Records must be assigned to the same partition
Same key, different partition count	Partition assignment is undefined
Null key (Kafka 2.4+)	Sticky partitioning—records batch to one partition until `batch.size` or `linger.ms` triggers send
Null key (pre-2.4)	Round-robin distribution across available partitions

Partition Count Changes

Increasing partition count changes the key-to-partition mapping. Records with the same key may be assigned to different partitions before and after the change. Applications depending on key-based ordering must either:

Never increase partition count
Drain the topic before increasing partitions
Accept ordering discontinuity during transition

Partition Count Selection¶

The partition count must be chosen at topic creation. It may be increased but must not be decreased without recreating the topic.

Factor	Guidance
Consumer parallelism	Partition count should be ≥ expected consumer count
Throughput	Each partition adds ~10 MB/s write capacity (varies by hardware)
Ordering requirements	Fewer partitions = broader ordering scope
Broker memory	Each partition consumes ~1-2 MB of broker heap
Recovery time	More partitions increases leader election and rebalance time

Replication¶

Each partition may be replicated across multiple brokers. Replication provides fault tolerance—the cluster continues operating when brokers fail.

Replication Roles¶

Role	Behavior
Leader	Must handle all produce and consume requests for the partition
Follower	Must replicate records from leader; may not serve client requests (prior to Kafka 2.4)
ISR member	Follower that has fully caught up with the leader within `replica.lag.time.max.ms`

Changed in Kafka 2.4 (KIP-392): Followers may serve read requests when replica.selector.class is configured.

Replication Factor Constraints¶

Constraint	Behavior
Minimum	Replication factor must be ≥ 1
Maximum	Replication factor must be ≤ broker count
Modification	Replication factor may be changed via partition reassignment

Durability Guarantees¶

Durability depends on the producer's acks setting and the topic's min.insync.replicas:

`acks`	`min.insync.replicas`	Guarantee
`0`	Any	No durability—record may be lost before any broker persists it
`1`	Any	Leader must persist before acknowledging; data lost if leader fails before replication
`all`	1	Leader must persist; equivalent to `acks=1`
`all`	2	At least 2 replicas (including leader) must persist before acknowledging
`all`	N	At least N replicas must persist; produces fail if ISR < N

ISR Shrinkage

When ISR size falls below min.insync.replicas, producers with acks=all receive NotEnoughReplicasException. The topic remains readable but writes are blocked until ISR recovers.

Fault Tolerance¶

Replication Factor	Tolerated Failures	Production Suitability
1	0	Development only—must not be used in production
2	1	Minimal production; single failure causes unavailability during leader election
3	2	Standard production; recommended minimum
5	4	Critical data requiring extended failure tolerance

Retention¶

Kafka retains records based on time, size, or key-based compaction. Retention is configured per-topic and enforced per-partition.

Retention Policies¶

Policy	`cleanup.policy`	Behavior
Delete	`delete`	Segments older than `retention.ms` or exceeding `retention.bytes` are deleted
Compact	`compact`	Only the latest record per key is retained; older duplicates are removed
Both	`delete,compact`	Compaction runs first; then time/size-based deletion applies

Delete Policy Guarantees¶

Configuration	Default	Guarantee
`retention.ms`	604800000 (7d)	Records older than this value may be deleted
`retention.bytes`	-1 (unlimited)	Per-partition; oldest segments deleted when exceeded
`segment.ms`	604800000 (7d)	Active segment rolls after this interval
`segment.bytes`	1073741824 (1GB)	Active segment rolls when size exceeded

Retention Timing

Retention applies to closed segments only. The active segment is not eligible for deletion regardless of age or size until it rolls.

Log Compaction Guarantees¶

For topics with cleanup.policy=compact:

Guarantee	Behavior
Latest value	The most recent record for each key must be retained
Ordering	Record order within a key must be preserved
Tombstone retention	Tombstones (null values) must be retained for at least `delete.retention.ms`
Head of log	Records in the active segment are not eligible for compaction

Compaction Timing

Compaction is not immediate. Records may exist in duplicate until the log cleaner processes the segment. Applications must not assume instantaneous compaction.

Configuration	Default	Purpose
`min.cleanable.dirty.ratio`	0.5	Minimum dirty/total ratio before compaction eligible
`min.compaction.lag.ms`	0	Minimum time before record eligible for compaction
`max.compaction.lag.ms`	∞	Maximum time before compaction is forced
`delete.retention.ms`	86400000 (1d)	Tombstone retention period

Topic Naming¶

Topic names must conform to the following constraints:

Constraint	Rule
Length	Must be 1-249 characters
Characters	Must contain only `[a-zA-Z0-9._-]`
Reserved prefixes	Names starting with `__` are reserved for internal topics
Uniqueness	Must be unique within the cluster

Period and Underscore Collision

Kafka uses . and _ interchangeably in some metric names. Topics my.topic and my_topic may collide in metrics. A cluster should use one convention consistently.

Internal Topics¶

Kafka creates and manages internal topics for cluster operations. These topics must not be modified directly.

Topic	Purpose	Created By
`__consumer_offsets`	Consumer group offset storage	Kafka (automatic)
`__transaction_state`	Transaction coordinator state	Kafka (automatic)
`_schemas`	Schema storage	Schema Registry
`connect-offsets`	Connector offset tracking	Kafka Connect
`connect-configs`	Connector configurations	Kafka Connect
`connect-status`	Connector and task status	Kafka Connect

Failure Semantics¶

Leader Failure¶

When a partition leader fails:

Controller detects failure via ZooKeeper session timeout (ZK mode) or heartbeat timeout (KRaft mode)
Controller selects new leader from ISR
New leader must have all committed records
Producers receive NotLeaderOrFollowerException and must refresh metadata
In-flight requests with acks=all may timeout; clients should retry

Scenario	Outcome
ISR contains eligible replicas	New leader elected; brief unavailability during election
ISR empty, `unclean.leader.election.enable=false`	Partition unavailable until replica recovers
ISR empty, `unclean.leader.election.enable=true`	Data loss possible; out-of-sync replica becomes leader

Unclean Leader Election

Setting unclean.leader.election.enable=true allows data loss. An out-of-sync replica may become leader, discarding records not yet replicated. This setting should remain false for production topics.

Broker Failure¶

Failure Mode	Impact
Single broker, RF ≥ 2	Affected partitions elect new leaders; cluster remains available
Multiple brokers, failures < RF	Cluster remains available if each partition has ≥ 1 ISR member
Failures ≥ RF for any partition	Affected partitions become unavailable

Version Compatibility¶

Feature	Minimum Version	Notes
Topic creation via Admin API	0.10.1.0	`kafka-topics.sh` available earlier
Log compaction	0.8.1
Follower fetching (KIP-392)	2.4.0	Requires `replica.selector.class` configuration
Tiered storage (KIP-405)	3.6.0	Early access; production readiness varies
KRaft mode (no ZooKeeper)	3.3.0	Production-ready in 3.3+; recommended in 3.6+

Topic Configuration Reference¶

Core Settings¶

Configuration	Default	Description
`partitions`	1	Number of partitions
`replication.factor`	-	Number of replicas (broker default: `default.replication.factor`)
`min.insync.replicas`	1	Minimum ISR for `acks=all` produces

Retention Settings¶

Configuration	Default	Description
`retention.ms`	604800000	Time-based retention (ms)
`retention.bytes`	-1	Size-based retention per partition (-1 = unlimited)
`cleanup.policy`	delete	`delete`, `compact`, or `delete,compact`

Segment Settings¶

Configuration	Default	Description
`segment.bytes`	1073741824	Segment file size
`segment.ms`	604800000	Time before rolling segment
`segment.index.bytes`	10485760	Index file size

Compaction Settings¶

Configuration	Default	Description
`min.cleanable.dirty.ratio`	0.5	Dirty ratio threshold for compaction
`min.compaction.lag.ms`	0	Minimum age before compaction
`max.compaction.lag.ms`	-	Maximum age before forced compaction
`delete.retention.ms`	86400000	Tombstone retention

Concepts Overview - Kafka fundamentals
Producers - Writing to topics
Consumers - Reading from topics
Replication - Replication architecture
Operations - Topic operations