Kafka Performance¶
Performance tuning guide for Apache Kafka clusters.
Performance Dimensions¶
| Dimension | Description | Trade-offs |
|---|---|---|
| Throughput | Messages/bytes per second | May increase latency |
| Latency | End-to-end message delay | May reduce throughput |
| Durability | Data persistence guarantee | Impacts throughput |
| Availability | System uptime | Requires more resources |
Throughput Optimization¶
Producer Throughput¶
# High throughput producer configuration
batch.size=131072 # 128KB batches
linger.ms=20 # Wait for batching
buffer.memory=67108864 # 64MB buffer
compression.type=lz4 # Fast compression
acks=1 # Leader ack only (trade durability)
max.in.flight.requests.per.connection=5
Throughput Checklist: - [ ] Enable compression (lz4 or zstd) - [ ] Increase batch.size - [ ] Set linger.ms > 0 - [ ] Use async sends - [ ] Partition across multiple brokers
Consumer Throughput¶
# High throughput consumer configuration
fetch.min.bytes=65536 # Fetch at least 64KB
fetch.max.wait.ms=500 # Wait up to 500ms
fetch.max.bytes=52428800 # 50MB per fetch
max.partition.fetch.bytes=10485760 # 10MB per partition
max.poll.records=1000 # Records per poll
Broker Throughput¶
# Broker configuration
num.network.threads=8 # Network I/O threads
num.io.threads=16 # Disk I/O threads
socket.send.buffer.bytes=1048576 # 1MB send buffer
socket.receive.buffer.bytes=1048576 # 1MB receive buffer
num.replica.fetchers=4 # Replication threads
Latency Optimization¶
Low Latency Producer¶
# Low latency producer configuration
batch.size=16384 # Small batches
linger.ms=0 # No batching delay
compression.type=none # No compression overhead
acks=1 # Leader ack only
max.block.ms=1000 # Fast failure
Low Latency Consumer¶
# Low latency consumer configuration
fetch.min.bytes=1 # Return immediately
fetch.max.wait.ms=0 # No wait
max.poll.records=100 # Small batches
Broker Latency¶
# Broker configuration
socket.request.max.bytes=10485760 # Smaller max request
num.network.threads=16 # More network threads
Durability Configuration¶
Maximum Durability¶
# Producer
acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=5
# Broker/Topic
min.insync.replicas=2
unclean.leader.election.enable=false
default.replication.factor=3
Balanced Durability¶
# Producer
acks=all
enable.idempotence=true
# Topic
min.insync.replicas=2
replication.factor=3
Compression Tuning¶
Compression Comparison¶
| Algorithm | CPU Usage | Compression Ratio | Speed |
|---|---|---|---|
| none | 0% | 1.0x | Fastest |
| snappy | Low | 1.5-2x | Fast |
| lz4 | Low | 2-3x | Fast |
| gzip | High | 3-5x | Slow |
| zstd | Medium | 3-4x | Fast |
Compression Selection¶
# Best for most use cases
compression.type=lz4
# Maximum compression
compression.type=zstd
# CPU constrained
compression.type=snappy
Partition Tuning¶
Partition Count¶
Formula:
Partitions = max(
throughput / per_partition_throughput,
consumer_instances
)
Guidelines: - ~10 MB/s per partition typical - More partitions = more parallelism - Too many partitions = overhead
Partition Distribution¶
# Check leader distribution
kafka-topics.sh --bootstrap-server kafka:9092 --describe | \
grep "Leader:" | awk '{print $4}' | sort | uniq -c
# Rebalance leaders
kafka-leader-election.sh --bootstrap-server kafka:9092 \
--election-type preferred \
--all-topic-partitions
OS Tuning¶
Linux Kernel Parameters¶
# /etc/sysctl.conf
# Network buffers
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 87380 16777216
# File descriptors
fs.file-max=1000000
# Virtual memory
vm.swappiness=1
vm.dirty_ratio=80
vm.dirty_background_ratio=5
# Page cache
vm.vfs_cache_pressure=50
File Descriptor Limits¶
# /etc/security/limits.conf
kafka soft nofile 128000
kafka hard nofile 128000
kafka soft nproc 128000
kafka hard nproc 128000
Disk Configuration¶
# Mount options for Kafka data
/dev/sdb /kafka xfs noatime,nodiratime,nobarrier 0 2
# I/O scheduler (for SSDs)
echo none > /sys/block/sdb/queue/scheduler
# Read-ahead (for HDDs)
blockdev --setra 4096 /dev/sdb
JVM Tuning¶
Heap Configuration¶
# Recommended heap size: 6-8GB
export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"
GC Configuration¶
export KAFKA_JVM_PERFORMANCE_OPTS="-server \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=20 \
-XX:InitiatingHeapOccupancyPercent=35 \
-XX:G1HeapRegionSize=16M \
-XX:MinMetaspaceFreeRatio=50 \
-XX:MaxMetaspaceFreeRatio=80"
Benchmarking¶
Producer Benchmark¶
# Throughput test
kafka-producer-perf-test.sh \
--topic test-topic \
--num-records 10000000 \
--record-size 1024 \
--throughput -1 \
--producer-props \
bootstrap.servers=kafka:9092 \
batch.size=65536 \
linger.ms=10 \
compression.type=lz4
Consumer Benchmark¶
# Throughput test
kafka-consumer-perf-test.sh \
--bootstrap-server kafka:9092 \
--topic test-topic \
--messages 10000000 \
--threads 4
End-to-End Latency¶
kafka-run-class.sh kafka.tools.EndToEndLatency \
kafka:9092 \
test-topic \
10000 \
all \
1024
Performance Metrics¶
Key Metrics¶
| Metric | Description | Target |
|---|---|---|
MessagesInPerSec |
Produce rate | Workload dependent |
BytesInPerSec |
Bytes produced | Workload dependent |
BytesOutPerSec |
Bytes consumed | Workload dependent |
TotalTimeMs (P99) |
Request latency | < 100ms |
RequestQueueTimeMs |
Queue time | < 10ms |
UnderReplicatedPartitions |
Replication health | 0 |
Identifying Bottlenecks¶
| Symptom | Likely Cause | Solution |
|---|---|---|
| High RequestQueueTimeMs | Network threads saturated | Increase num.network.threads |
| High LocalTimeMs | Disk I/O slow | Faster disks, more I/O threads |
| High RemoteTimeMs | Replication lag | More replica fetchers |
| High ResponseQueueTimeMs | Network threads saturated | Increase num.network.threads |
Related Documentation¶
- Operations Overview - Operations guide
- Capacity Planning - Sizing guide
- Configuration - Configuration reference
- Monitoring - Metrics and alerting