Kafka Ecosystem¶

The Apache Kafka platform consists of multiple components that work together to provide a complete event streaming infrastructure.

Ecosystem Overview¶

Modern Kafka deployments typically include multiple components beyond the core broker cluster. Understanding the role of each component clarifies architectural decisions and deployment patterns.

Component	Purpose	Deployment
Kafka Brokers	Distributed commit log storage and serving	Cluster of 3+ nodes
Schema Registry	Schema management and compatibility enforcement	Separate service
Kafka Connect	Data integration framework with 200+ connectors	Distributed workers
Kafka Streams	Stream processing library	Embedded in applications

Kafka Brokers¶

Kafka brokers form the core of the platform: a distributed cluster that stores and serves event streams.

Broker Responsibilities¶

Responsibility	Description
Message storage	Persist messages to disk in log segments
Replication	Maintain copies across brokers for fault tolerance
Leader election	Coordinate partition leadership
Client protocol	Handle produce and fetch requests
Cluster coordination	Participate in cluster membership (via KRaft or ZooKeeper)

KRaft vs ZooKeeper¶

Kafka is transitioning from ZooKeeper-based coordination to KRaft (Kafka Raft), a built-in consensus protocol.

Aspect	ZooKeeper Mode	KRaft Mode
External dependency	Requires ZooKeeper cluster	Self-contained
Metadata storage	Split between ZK and brokers	Unified in Kafka
Operational complexity	Two systems to manage	Single system
Recovery time	Slower (ZK sync required)	Faster failover
Scale limits	~200K partitions	Millions of partitions
Version support	All versions	Kafka 3.3+ production ready

KRaft mode is the recommended deployment model for new clusters (Kafka 3.3+).

Schema Registry¶

Schema Registry provides centralized schema management for Kafka data, ensuring producers and consumers agree on data formats.

Why Schema Registry?¶

Without schema management:

Producers can change data format without warning
Consumers fail when they encounter unexpected formats
Schema evolution becomes dangerous
Documentation becomes the only contract (and quickly becomes stale)

Schema Registry Features¶

Feature	Description
Schema storage	Schemas stored in a Kafka topic (`_schemas`)
Compatibility checking	Validates new schemas against compatibility rules
Schema evolution	Supports backward, forward, and full compatibility
Multiple formats	Avro, Protobuf, JSON Schema
Subject management	Schemas organized by subject (typically topic-based)

Compatibility Modes¶

Mode	Rule	Safe Changes
BACKWARD	New schema can read old data	Add optional fields, remove fields
FORWARD	Old schema can read new data	Remove optional fields, add fields
FULL	Both backward and forward	Add/remove optional fields only
NONE	No checking	Any change (dangerous)

Schema Registry is critical for production deployments where multiple teams produce and consume data independently.

→ Schema Registry Guide

Kafka Connect¶

Kafka Connect is a framework for streaming data between Kafka and external systems. It is the primary integration mechanism for most Kafka deployments.

Why Kafka Connect?¶

Without Connect	With Connect
Write custom producer/consumer for each system	Use pre-built connectors
Implement offset tracking, error handling, retry logic	Framework handles operational concerns
Build and maintain integration infrastructure	Focus on configuration
Different code for each data source/sink	Consistent operational model

Connect Architecture¶

Connector Types¶

Type	Direction	Example Use Cases
Source connectors	External → Kafka	Database CDC, file ingestion, API polling
Sink connectors	Kafka → External	Data lake writes, search indexing, notifications

Connector Ecosystem¶

Over 200 connectors are available:

Category	Examples
Event Sources	HTTP/REST, MQTT, File/Syslog, JMS/MQ
Cloud Storage Sinks	S3, GCS, Azure Blob, HDFS
Database Sinks	Cassandra, Elasticsearch, OpenSearch
Data Warehouse Sinks	Snowflake, BigQuery, Redshift

Kafka Connect is often the most valuable component of a Kafka deployment—it eliminates thousands of lines of integration code.

→ Kafka Connect Guide

Kafka Streams¶

Kafka Streams is a client library for building stream processing applications. Unlike Connect, Streams is embedded in applications rather than deployed as a separate cluster.

Streams Characteristics¶

Aspect	Description
Deployment	Library embedded in application (JAR dependency)
Scaling	Add application instances to scale
State management	Local state stores with changelog topics for recovery
Processing semantics	Exactly-once processing supported
Fault tolerance	Automatic state recovery from changelog topics

When to Use Streams¶

Use Case	Why Streams
Stateful transformations	Aggregations, joins, windowing
Application-embedded processing	No separate cluster to manage
Microservice event processing	Natural fit for event-driven services
Real-time enrichment	Stream-table joins for lookups

Streams vs Connect¶

Aspect	Kafka Streams	Kafka Connect
Purpose	Data processing and transformation	Data movement between systems
Deployment	Application library	Separate worker cluster
Custom logic	Full programming model	Configuration + SMTs
State	Built-in state stores	Stateless (connector-dependent)
Scaling	Add application instances	Add workers

→ Kafka Streams Guide

Component Selection Guide¶

Requirement	Recommended Component
Stream events from APIs to Kafka	Kafka Connect + HTTP/MQTT Source
Move data from Kafka to data lake	Kafka Connect + S3/GCS connector
Transform and enrich events	Kafka Streams
Custom application logic	Kafka Streams or custom consumer
Schema enforcement	Schema Registry
Real-time dashboards	Kafka Streams + external visualization

Typical Production Architecture¶

Version Compatibility¶

Component	Kafka 2.x	Kafka 3.x	Kafka 4.x
Kafka Brokers	✅	✅	✅
Schema Registry	✅	✅	✅
Kafka Connect	✅	✅	✅
Kafka Streams	✅	✅	✅
KRaft (production)	❌	✅ (3.3+)	✅

Event Streaming Fundamentals - Core concepts
Kafka Connect - Data integration deep-dive
Delivery Semantics - Message delivery guarantees
Schema Registry - Schema management guide
Kafka Streams - Stream processing library