Skip to content

Kafka on AWS

Production deployment guide for Apache Kafka on Amazon Web Services.


Architecture Overview

uml diagram


EC2 Instance Selection

Workload Instance Type vCPUs Memory Network Use Case
Development m6i.large 2 8 GB Up to 12.5 Gbps Testing
Small Production m6i.xlarge 4 16 GB Up to 12.5 Gbps Low throughput
Medium Production m6i.2xlarge 8 32 GB Up to 12.5 Gbps Moderate throughput
Large Production r6i.4xlarge 16 128 GB Up to 12.5 Gbps High throughput
High Performance i3en.2xlarge 8 64 GB Up to 25 Gbps I/O intensive

Instance Recommendations

  • Network-optimized instances (m6i, r6i) should be used for most deployments
  • Storage-optimized instances (i3en, d3en) may be used for I/O-intensive workloads
  • Memory-optimized instances (r6i) should be used when large page cache is required
  • A minimum of 3 brokers should be deployed across 3 availability zones

Storage Configuration

EBS Volume Types

Volume Type IOPS Throughput Use Case
gp3 3,000-16,000 125-1,000 MB/s Recommended default
io2 Up to 64,000 Up to 4,000 MB/s High-performance
st1 Baseline 40 MB/s per TB Up to 500 MB/s Cold data, cost-optimized

gp3 Configuration

# Create gp3 volume with custom IOPS and throughput
aws ec2 create-volume \
  --availability-zone us-east-1a \
  --volume-type gp3 \
  --size 1000 \
  --iops 6000 \
  --throughput 500 \
  --encrypted \
  --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=kafka-data}]'

Storage Sizing Guidelines

Metric Recommendation
Volume Size 10x daily data volume
IOPS 3,000 + (500 × partitions per broker)
Throughput 2x expected peak MB/s

Filesystem Configuration

# Format with XFS
mkfs.xfs /dev/nvme1n1

# Mount options for Kafka
echo '/dev/nvme1n1 /kafka xfs noatime,nodiratime 0 2' >> /etc/fstab
mount /kafka

# Create Kafka directories
mkdir -p /kafka/data
chown -R kafka:kafka /kafka

Networking

VPC Configuration

# Create VPC
aws ec2 create-vpc --cidr-block 10.0.0.0/16

# Create subnets in each AZ
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.1.0/24 --availability-zone us-east-1a
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.2.0/24 --availability-zone us-east-1b
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.3.0/24 --availability-zone us-east-1c

Security Groups

# Kafka broker security group
aws ec2 create-security-group \
  --group-name kafka-brokers \
  --description "Kafka broker security group" \
  --vpc-id vpc-xxx

# Allow inter-broker communication
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxx \
  --protocol tcp \
  --port 9092-9094 \
  --source-group sg-xxx

# Allow client access
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxx \
  --protocol tcp \
  --port 9092 \
  --cidr 10.0.0.0/16

Required Ports

Port Protocol Purpose
9092 TCP Client connections (PLAINTEXT)
9093 TCP Client connections (SSL)
9094 TCP Client connections (SASL_SSL)
9093 TCP Inter-broker replication

Enhanced Networking

Enhanced networking should be enabled for improved network performance.

# Verify enhanced networking
aws ec2 describe-instances --instance-ids i-xxx \
  --query "Reservations[].Instances[].EnaSupport"

High Availability

Multi-AZ Deployment

Brokers must be distributed across multiple availability zones for fault tolerance.

# server.properties - Rack awareness
broker.rack=us-east-1a

# Replica placement
default.replication.factor=3
min.insync.replicas=2

Rack Awareness Configuration

# Broker 1 (AZ-a)
broker.id=1
broker.rack=us-east-1a

# Broker 2 (AZ-b)
broker.id=2
broker.rack=us-east-1b

# Broker 3 (AZ-c)
broker.id=3
broker.rack=us-east-1c

Auto Scaling

Launch Template

{
  "LaunchTemplateName": "kafka-broker",
  "LaunchTemplateData": {
    "ImageId": "ami-xxx",
    "InstanceType": "m6i.2xlarge",
    "KeyName": "kafka-key",
    "SecurityGroupIds": ["sg-xxx"],
    "BlockDeviceMappings": [
      {
        "DeviceName": "/dev/sda1",
        "Ebs": {
          "VolumeSize": 100,
          "VolumeType": "gp3"
        }
      },
      {
        "DeviceName": "/dev/sdf",
        "Ebs": {
          "VolumeSize": 1000,
          "VolumeType": "gp3",
          "Iops": 6000,
          "Throughput": 500
        }
      }
    ],
    "UserData": "base64-encoded-startup-script"
  }
}

Auto Scaling Limitations

Kafka does not automatically rebalance partitions when brokers are added or removed. Partition reassignment must be performed manually after scaling operations.


Terraform Example

# main.tf
provider "aws" {
  region = "us-east-1"
}

resource "aws_instance" "kafka_broker" {
  count         = 3
  ami           = "ami-xxx"
  instance_type = "m6i.2xlarge"
  subnet_id     = element(var.subnet_ids, count.index)

  vpc_security_group_ids = [aws_security_group.kafka.id]

  root_block_device {
    volume_type = "gp3"
    volume_size = 100
  }

  tags = {
    Name = "kafka-broker-${count.index + 1}"
    Role = "kafka-broker"
  }
}

resource "aws_ebs_volume" "kafka_data" {
  count             = 3
  availability_zone = element(var.availability_zones, count.index)
  size              = 1000
  type              = "gp3"
  iops              = 6000
  throughput        = 500
  encrypted         = true

  tags = {
    Name = "kafka-data-${count.index + 1}"
  }
}

resource "aws_volume_attachment" "kafka_data" {
  count       = 3
  device_name = "/dev/sdf"
  volume_id   = aws_ebs_volume.kafka_data[count.index].id
  instance_id = aws_instance.kafka_broker[count.index].id
}

resource "aws_security_group" "kafka" {
  name        = "kafka-brokers"
  description = "Kafka broker security group"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 9092
    to_port     = 9094
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
  }

  ingress {
    from_port = 9092
    to_port   = 9094
    protocol  = "tcp"
    self      = true
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Monitoring Integration

CloudWatch Metrics

# Install CloudWatch agent
sudo yum install amazon-cloudwatch-agent

# Configure for Kafka JMX metrics
cat > /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json << 'EOF'
{
  "metrics": {
    "metrics_collected": {
      "jmx": {
        "jvm": true,
        "kafka": {
          "measurement": [
            "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
            "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
            "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions"
          ]
        }
      }
    }
  }
}
EOF