Skip to content

Kafka on GCP

Production deployment guide for Apache Kafka on Google Cloud Platform.


Architecture Overview

uml diagram


Compute Engine Selection

Workload Machine Type vCPUs Memory Use Case
Development e2-standard-2 2 8 GB Testing
Small Production n2-standard-4 4 16 GB Low throughput
Medium Production n2-standard-8 8 32 GB Moderate throughput
Large Production n2-highmem-16 16 128 GB High throughput
High Performance c3-standard-8 8 32 GB CPU-intensive

Machine Type Guidelines

  • N2 series provides balanced performance for most workloads
  • N2-highmem should be used when large page cache is required
  • C3 series provides higher per-core performance
  • Local SSD may be attached for ephemeral high-performance storage

Storage Configuration

Persistent Disk Types

Disk Type IOPS (read) IOPS (write) Throughput Use Case
Hyperdisk Extreme 350,000 350,000 5,000 MB/s Highest performance
Hyperdisk Balanced 160,000 90,000 2,400 MB/s High performance
SSD Persistent Disk 100,000 70,000 1,200 MB/s Recommended
Balanced Persistent Disk 80,000 30,000 1,200 MB/s Cost-optimized

SSD Persistent Disk Configuration

# Create SSD persistent disk
gcloud compute disks create kafka-data-1 \
  --project=my-project \
  --zone=us-central1-a \
  --type=pd-ssd \
  --size=1000GB

# Attach disk to instance
gcloud compute instances attach-disk kafka-broker-1 \
  --disk=kafka-data-1 \
  --zone=us-central1-a

Storage Sizing Guidelines

Metric Recommendation
Disk Size 10x daily data volume
IOPS 3,000 + (500 × partitions per broker)
Throughput 2x expected peak MB/s

Filesystem Configuration

# Format with XFS
sudo mkfs.xfs -f /dev/sdb

# Mount options for Kafka
echo '/dev/sdb /kafka xfs noatime,nodiratime 0 2' | sudo tee -a /etc/fstab
sudo mount /kafka

# Create Kafka directories
sudo mkdir -p /kafka/data
sudo chown -R kafka:kafka /kafka

Networking

VPC Network Configuration

# Create VPC network
gcloud compute networks create kafka-network \
  --subnet-mode=custom

# Create subnets in each zone
gcloud compute networks subnets create kafka-subnet-a \
  --network=kafka-network \
  --region=us-central1 \
  --range=10.0.1.0/24

gcloud compute networks subnets create kafka-subnet-b \
  --network=kafka-network \
  --region=us-central1 \
  --range=10.0.2.0/24

gcloud compute networks subnets create kafka-subnet-c \
  --network=kafka-network \
  --region=us-central1 \
  --range=10.0.3.0/24

Firewall Rules

# Allow inter-broker communication
gcloud compute firewall-rules create kafka-internal \
  --network=kafka-network \
  --allow=tcp:9092-9094 \
  --source-ranges=10.0.0.0/16 \
  --target-tags=kafka-broker

# Allow client access
gcloud compute firewall-rules create kafka-clients \
  --network=kafka-network \
  --allow=tcp:9092 \
  --source-ranges=10.0.0.0/16 \
  --target-tags=kafka-broker

Required Ports

Port Protocol Purpose
9092 TCP Client connections (PLAINTEXT)
9093 TCP Client connections (SSL)
9094 TCP Client connections (SASL_SSL)
9093 TCP Inter-broker replication

Network Tier

Premium network tier should be used for production deployments to ensure lowest latency.

# Create instance with premium network tier
gcloud compute instances create kafka-broker-1 \
  --network-tier=PREMIUM \
  --network=kafka-network \
  --subnet=kafka-subnet-a

High Availability

Multi-Zone Deployment

Brokers must be distributed across multiple zones for fault tolerance.

# Create instances in different zones
gcloud compute instances create kafka-broker-1 \
  --zone=us-central1-a \
  --machine-type=n2-standard-8 \
  --network=kafka-network \
  --subnet=kafka-subnet-a \
  --tags=kafka-broker

gcloud compute instances create kafka-broker-2 \
  --zone=us-central1-b \
  --machine-type=n2-standard-8 \
  --network=kafka-network \
  --subnet=kafka-subnet-b \
  --tags=kafka-broker

gcloud compute instances create kafka-broker-3 \
  --zone=us-central1-c \
  --machine-type=n2-standard-8 \
  --network=kafka-network \
  --subnet=kafka-subnet-c \
  --tags=kafka-broker

Rack Awareness Configuration

# Broker 1 (Zone A)
broker.id=1
broker.rack=us-central1-a

# Broker 2 (Zone B)
broker.id=2
broker.rack=us-central1-b

# Broker 3 (Zone C)
broker.id=3
broker.rack=us-central1-c

Instance Groups

Managed Instance Group

# Create instance template
gcloud compute instance-templates create kafka-template \
  --machine-type=n2-standard-8 \
  --network=kafka-network \
  --subnet=kafka-subnet-a \
  --tags=kafka-broker \
  --metadata-from-file=startup-script=startup.sh \
  --disk=name=boot,boot=yes,size=100GB,type=pd-ssd \
  --disk=name=data,size=1000GB,type=pd-ssd

# Create regional managed instance group
gcloud compute instance-groups managed create kafka-mig \
  --template=kafka-template \
  --size=3 \
  --region=us-central1 \
  --zones=us-central1-a,us-central1-b,us-central1-c

Instance Group Limitations

Managed instance groups should not be used with auto-scaling for Kafka brokers. Kafka requires stable broker IDs and manual partition reassignment when scaling.


Terraform Example

# main.tf
provider "google" {
  project = "my-project"
  region  = "us-central1"
}

resource "google_compute_network" "kafka" {
  name                    = "kafka-network"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "kafka" {
  count         = 3
  name          = "kafka-subnet-${count.index}"
  ip_cidr_range = "10.0.${count.index + 1}.0/24"
  region        = "us-central1"
  network       = google_compute_network.kafka.id
}

resource "google_compute_firewall" "kafka_internal" {
  name    = "kafka-internal"
  network = google_compute_network.kafka.name

  allow {
    protocol = "tcp"
    ports    = ["9092-9094"]
  }

  source_ranges = ["10.0.0.0/16"]
  target_tags   = ["kafka-broker"]
}

locals {
  zones = ["us-central1-a", "us-central1-b", "us-central1-c"]
}

resource "google_compute_instance" "kafka" {
  count        = 3
  name         = "kafka-broker-${count.index + 1}"
  machine_type = "n2-standard-8"
  zone         = local.zones[count.index]

  tags = ["kafka-broker"]

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
      size  = 100
      type  = "pd-ssd"
    }
  }

  network_interface {
    subnetwork = google_compute_subnetwork.kafka[count.index].id
  }

  labels = {
    role = "kafka-broker"
  }

  metadata = {
    startup-script = file("startup.sh")
  }
}

resource "google_compute_disk" "kafka_data" {
  count = 3
  name  = "kafka-data-${count.index + 1}"
  type  = "pd-ssd"
  zone  = local.zones[count.index]
  size  = 1000
}

resource "google_compute_attached_disk" "kafka_data" {
  count    = 3
  disk     = google_compute_disk.kafka_data[count.index].id
  instance = google_compute_instance.kafka[count.index].id
}

Internal Load Balancer

For client connectivity within the VPC.

# Create health check
gcloud compute health-checks create tcp kafka-health-check \
  --port=9092

# Create backend service
gcloud compute backend-services create kafka-backend \
  --protocol=TCP \
  --health-checks=kafka-health-check \
  --region=us-central1

# Add instance group to backend
gcloud compute backend-services add-backend kafka-backend \
  --instance-group=kafka-mig \
  --region=us-central1

# Create forwarding rule
gcloud compute forwarding-rules create kafka-ilb \
  --load-balancing-scheme=INTERNAL \
  --network=kafka-network \
  --subnet=kafka-subnet-a \
  --ports=9092 \
  --backend-service=kafka-backend \
  --region=us-central1

Private Service Connect

For secure connectivity from other GCP projects.

# Create service attachment
gcloud compute service-attachments create kafka-attachment \
  --region=us-central1 \
  --producer-forwarding-rule=kafka-ilb \
  --connection-preference=ACCEPT_AUTOMATIC \
  --nat-subnets=kafka-psc-subnet