Kafka Consumer Group Rebalancing
Consumer group rebalancing redistributes partition assignments among group members. This document covers the rebalance protocol, assignment strategies, and mechanisms to minimize rebalance impact.
Rebalancing Overview
Why Rebalancing Occurs
| Trigger |
Description |
| Consumer join |
New consumer joins group |
| Consumer leave |
Consumer leaves gracefully |
| Consumer failure |
Heartbeat timeout |
| Subscription change |
Topic subscription modified |
| Partition change |
Partitions added to subscribed topic |
| Session timeout |
Consumer missed heartbeat deadline |
| Max poll exceeded |
Processing time exceeded max.poll.interval.ms |
Rebalance Impact

| Impact |
Description |
| Processing pause |
All consumers stop processing during rebalance |
| Increased latency |
Messages delayed during rebalance |
| Duplicate processing |
At-least-once semantics may cause reprocessing |
| State loss |
In-memory state may need rebuilding |
Rebalance Protocol
Eager Rebalancing (Traditional)

Cooperative Rebalancing (Incremental)

Protocol Comparison
| Aspect |
Eager |
Cooperative |
| Revocation |
All partitions revoked |
Only moved partitions revoked |
| Processing |
Full stop during rebalance |
Continues for stable partitions |
| Rebalance count |
Single rebalance |
May require multiple rounds |
| Complexity |
Simple |
More complex |
| Kafka version |
All versions |
2.4+ |
Rebalance Triggers
Heartbeat Mechanism

Timeout Configuration
| Configuration |
Default |
Description |
session.timeout.ms |
45000 |
Time to detect consumer failure |
heartbeat.interval.ms |
3000 |
Heartbeat frequency |
max.poll.interval.ms |
300000 |
Max time between poll() calls |
Relationship:
session.timeout.ms > heartbeat.interval.ms * 3
Typical: heartbeat = session_timeout / 3
Timeout Scenarios
| Scenario |
Trigger |
Result |
| Network partition |
No heartbeat |
Consumer removed after session.timeout |
| Long processing |
poll() delayed |
Consumer removed after max.poll.interval |
| Consumer crash |
No heartbeat |
Consumer removed after session.timeout |
| Graceful shutdown |
LeaveGroup |
Immediate rebalance |
Assignment Strategies
Range Assignor
Assigns partitions in ranges per topic.

RoundRobin Assignor
Distributes partitions round-robin across consumers.

Sticky Assignor
Preserves existing assignments while balancing.
| Characteristic |
Description |
| Stickiness |
Minimizes partition movement |
| Balance |
Ensures even distribution |
| Cooperative |
Supports incremental rebalancing |
# Configuration
partition.assignment.strategy=org.apache.kafka.clients.consumer.StickyAssignor
CooperativeSticky Assignor
Combines sticky assignment with cooperative rebalancing.
# Recommended for Kafka 2.4+
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
Strategy Comparison
| Strategy |
Balance |
Stickiness |
Cooperative |
Use Case |
| Range |
Good |
❌ |
❌ |
Simple, few topics |
| RoundRobin |
Best |
❌ |
❌ |
Many topics |
| Sticky |
Good |
✅ |
❌ |
Stateful processing |
| CooperativeSticky |
Good |
✅ |
✅ |
Production default |
Static Membership
Overview
Static membership assigns a persistent identity to consumers, reducing unnecessary rebalances.

Configuration
# Consumer configuration
group.instance.id=consumer-instance-1
session.timeout.ms=300000 # 5 minutes (can be longer with static membership)
Static vs Dynamic Membership
| Aspect |
Dynamic |
Static |
| Consumer restart |
Triggers rebalance |
No rebalance (within timeout) |
| Session timeout |
Short (45s typical) |
Can be longer (5-30 min) |
| Member ID |
Assigned by coordinator |
Derived from instance ID |
| Deployment |
Rolling restarts cause churn |
Smooth rolling restarts |
Static Membership Benefits
| Benefit |
Description |
| Reduced rebalances |
Transient failures don't trigger rebalance |
| Faster recovery |
Same assignment on rejoin |
| Rolling deployments |
One consumer at a time without full rebalance |
| Stable state |
Kafka Streams state stores remain local |
Rebalance Optimization
Minimize Rebalance Frequency
# Increase session timeout for static membership
session.timeout.ms=300000
# Use cooperative rebalancing
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
# Static membership
group.instance.id=my-consumer-instance-1
Minimize Rebalance Duration
# Faster heartbeat
heartbeat.interval.ms=1000
# Shorter JoinGroup timeout
max.poll.interval.ms=30000 # Reduce if processing is fast
# Pre-warm consumer before starting poll
# (Initialize resources before subscribing)
Handle Long Processing
// Pattern: Pause partitions during long processing
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
// Pause to prevent rebalance
consumer.pause(consumer.assignment());
// Long processing
processRecord(record);
// Resume
consumer.resume(consumer.assignment());
}
Rebalance Listener
ConsumerRebalanceListener
consumer.subscribe(Arrays.asList("orders"), new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// Called BEFORE rebalance
// Commit offsets for revoked partitions
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
for (TopicPartition partition : partitions) {
offsets.put(partition, new OffsetAndMetadata(currentOffset(partition)));
}
consumer.commitSync(offsets);
// Flush any buffered data
flushState(partitions);
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// Called AFTER rebalance
// Initialize state for new partitions
for (TopicPartition partition : partitions) {
initializeState(partition);
}
}
@Override
public void onPartitionsLost(Collection<TopicPartition> partitions) {
// Called when partitions lost without revoke (cooperative)
// State may already be invalid
log.warn("Partitions lost: {}", partitions);
}
});
Cooperative Rebalance Listener
// With cooperative rebalancing
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// Only revoked partitions, not all assigned
// Other partitions continue processing
commitOffsetsForPartitions(partitions);
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// Only newly assigned partitions
// Existing partitions unchanged
initializeStateForPartitions(partitions);
}
Monitoring Rebalances
Key Metrics
| Metric |
Description |
Alert |
rebalance-latency-avg |
Average rebalance duration |
> 30s |
rebalance-latency-max |
Maximum rebalance duration |
> 60s |
rebalance-total |
Total rebalance count |
Increasing rapidly |
last-rebalance-seconds-ago |
Time since last rebalance |
- |
failed-rebalance-total |
Failed rebalances |
> 0 |
Diagnosing Rebalance Issues
# Check consumer group state
kafka-consumer-groups.sh --bootstrap-server kafka:9092 \
--describe --group my-group
# Monitor group membership changes
kafka-consumer-groups.sh --bootstrap-server kafka:9092 \
--describe --group my-group --members
# Check for rebalance in logs
grep -i "rebalance\|revoke\|assign" /var/log/kafka/consumer.log
Common Issues
| Issue |
Symptom |
Solution |
| Frequent rebalances |
High rebalance-total |
Enable static membership |
| Long rebalances |
High rebalance-latency |
Reduce group size, use cooperative |
| Consumer timeouts |
Members leaving |
Increase max.poll.interval.ms |
| Uneven assignment |
Imbalanced lag |
Check assignment strategy |
Scaling Consumers
Adding Consumers

Scaling Limits
| Constraint |
Description |
| Max consumers = partitions |
Extra consumers idle |
| Adding partitions |
Changes key distribution |
| Consumer group size |
Larger groups = longer rebalances |
Scaling Best Practices
| Practice |
Rationale |
| Plan partition count for growth |
Avoid adding partitions later |
| Use static membership |
Smooth scaling operations |
| Scale gradually |
One consumer at a time |
| Monitor during scale |
Detect issues early |
Consumer Group States
State Machine

State Descriptions
| State |
Description |
| Empty |
No active members |
| PreparingRebalance |
Waiting for members to join |
| CompletingRebalance |
Waiting for SyncGroup |
| Stable |
Normal operation |
| Dead |
Group being deleted |
Version Compatibility
| Feature |
Kafka Version |
| Basic rebalancing |
0.9.0+ |
| Sticky assignor |
0.11.0+ |
| Static membership |
2.3.0+ |
| Cooperative rebalancing |
2.4.0+ |
| Incremental cooperative |
2.5.0+ |