Kafka Protocol Primitive Types¶
This document specifies the primitive data types used in the Apache Kafka binary wire protocol. All multi-byte values use big-endian (network) byte order unless otherwise specified. Implementations must encode and decode these types exactly as specified.
Integer Types¶
Fixed-Width Signed Integers¶
| Type | Size | Range | Encoding |
|---|---|---|---|
INT8 |
1 byte | -128 to 127 | Two's complement, big-endian |
INT16 |
2 bytes | -32,768 to 32,767 | Two's complement, big-endian |
INT32 |
4 bytes | -2,147,483,648 to 2,147,483,647 | Two's complement, big-endian |
INT64 |
8 bytes | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | Two's complement, big-endian |
Encoding Examples:
| Value | Type | Bytes (hex) |
|---|---|---|
| 0 | INT8 | 00 |
| -1 | INT8 | FF |
| 127 | INT8 | 7F |
| -128 | INT8 | 80 |
| 256 | INT16 | 01 00 |
| -1 | INT16 | FF FF |
| 16909060 | INT32 | 01 02 03 04 |
Fixed-Width Unsigned Integers¶
| Type | Size | Range | Encoding |
|---|---|---|---|
UINT16 |
2 bytes | 0 to 65,535 | Unsigned, big-endian |
UINT32 |
4 bytes | 0 to 4,294,967,295 | Unsigned, big-endian |
Unsigned Integer Usage
Unsigned integers are used sparingly in the protocol, primarily for lengths in compact encodings where negative values are invalid.
Variable-Length Integers¶
Variable-length integers provide efficient encoding of small values while supporting the full range of 32-bit or 64-bit integers.
VARINT (Signed 32-bit)¶
A variable-length encoded signed 32-bit integer using zig-zag encoding.
| Property | Value |
|---|---|
| Maximum encoded size | 5 bytes |
| Value range | -2,147,483,648 to 2,147,483,647 |
| Encoding | Zig-zag + variable-length |
Zig-zag Transformation:
encoded = (value << 1) ^ (value >> 31)
This transformation maps signed values to unsigned values:
| Signed Value | Zig-zag Encoded |
|---|---|
| 0 | 0 |
| -1 | 1 |
| 1 | 2 |
| -2 | 3 |
| 2 | 4 |
| 2147483647 | 4294967294 |
| -2147483648 | 4294967295 |
Variable-Length Encoding:
After zig-zag transformation, the value is encoded using continuation bits:
- Each byte uses 7 bits for data and 1 bit (MSB) as continuation flag
- MSB = 1 indicates more bytes follow
- MSB = 0 indicates final byte
Byte layout: [C][D6][D5][D4][D3][D2][D1][D0]
C = Continuation bit (1 = more bytes, 0 = last byte)
D = Data bits (7 per byte, little-endian order)
Encoding Examples:
| Value | Zig-zag | Encoded Bytes (hex) |
|---|---|---|
| 0 | 0 | 00 |
| -1 | 1 | 01 |
| 1 | 2 | 02 |
| 63 | 126 | 7E |
| 64 | 128 | 80 01 |
| -65 | 129 | 81 01 |
| 8191 | 16382 | FE 7F |
| 8192 | 16384 | 80 80 01 |
Maximum Byte Length
Implementations must reject VARINT values encoded in more than 5 bytes as a protocol error. A valid VARINT must have the continuation bit clear (0) by the 5th byte.
VARLONG (Signed 64-bit)¶
A variable-length encoded signed 64-bit integer using zig-zag encoding.
| Property | Value |
|---|---|
| Maximum encoded size | 10 bytes |
| Value range | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
| Encoding | Zig-zag + variable-length |
Zig-zag Transformation:
encoded = (value << 1) ^ (value >> 63)
Maximum Byte Length
Implementations must reject VARLONG values encoded in more than 10 bytes as a protocol error.
UNSIGNED_VARINT (Unsigned 32-bit)¶
A variable-length encoded unsigned 32-bit integer without zig-zag transformation.
| Property | Value |
|---|---|
| Maximum encoded size | 5 bytes |
| Value range | 0 to 4,294,967,295 |
| Encoding | Variable-length (no zig-zag) |
Encoding Examples:
| Value | Encoded Bytes (hex) |
|---|---|
| 0 | 00 |
| 1 | 01 |
| 127 | 7F |
| 128 | 80 01 |
| 16383 | FF 7F |
| 16384 | 80 80 01 |
Primary Usage:
- Length fields in compact encodings (
COMPACT_STRING,COMPACT_ARRAY, etc.) - Tagged field metadata
- Flexible version headers
Floating Point¶
FLOAT64¶
A double-precision 64-bit IEEE 754 floating-point number.
| Property | Value |
|---|---|
| Size | 8 bytes |
| Encoding | IEEE 754 binary64, big-endian |
| Special values | NaN, +Infinity, -Infinity supported |
Byte Layout:
Bit 63: Sign (0 = positive, 1 = negative)
Bits 62-52: Exponent (11 bits, biased by 1023)
Bits 51-0: Mantissa (52 bits)
NaN Handling
Implementations should use the canonical NaN representation (7FF8000000000000) when serializing NaN values. When deserializing, any value with exponent bits all set and non-zero mantissa must be interpreted as NaN.
UUID¶
A universally unique identifier as defined in RFC 4122.
| Property | Value |
|---|---|
| Size | 16 bytes |
| Encoding | Big-endian (most significant bits first) |
| Null representation | 16 zero bytes |
Byte Layout:
Bytes 0-7: Most significant 64 bits (time_low, time_mid, time_hi_and_version)
Bytes 8-15: Least significant 64 bits (clock_seq, node)
Null UUID:
A null UUID is represented as 16 consecutive zero bytes:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
UUID Version
The Kafka protocol does not mandate a specific UUID version. Implementations commonly use version 4 (random) UUIDs.
Boolean¶
A single-byte boolean value.
| Property | Value |
|---|---|
| Size | 1 byte |
| False | 0x00 |
| True | Any non-zero value |
Encoding Rules:
| Requirement | Level |
|---|---|
| Implementations should write 0x00 for false | should |
| Implementations should write 0x01 for true | should |
| Implementations must interpret 0x00 as false | must |
| Implementations must interpret any non-zero value as true | must |
String Types¶
STRING (Non-Nullable)¶
A length-prefixed UTF-8 string that cannot be null.
| Property | Value |
|---|---|
| Length field | INT16 |
| Maximum length | 32,767 bytes |
| Null handling | Not nullable |
Wire Format:
STRING => length:INT16 data:BYTES[length]
Encoding Example:
| String | Length | Encoded Bytes (hex) |
|---|---|---|
| "" | 0 | 00 00 |
| "a" | 1 | 00 01 61 |
| "hello" | 5 | 00 05 68 65 6C 6C 6F |
Null Constraint
A length value of -1 must be rejected as invalid for non-nullable STRING. Implementations must not encode null values using this type.
NULLABLE_STRING¶
A length-prefixed UTF-8 string that may be null.
| Property | Value |
|---|---|
| Length field | INT16 |
| Maximum length | 32,767 bytes |
| Null indicator | -1 |
Wire Format:
NULLABLE_STRING => length:INT16 [data:BYTES[length]]
If length == -1: null (no data bytes)
If length >= 0: exactly 'length' data bytes follow
Encoding Example:
| String | Length | Encoded Bytes (hex) |
|---|---|---|
| null | -1 | FF FF |
| "" | 0 | 00 00 |
| "test" | 4 | 00 04 74 65 73 74 |
COMPACT_STRING (Non-Nullable)¶
A variable-length prefixed UTF-8 string using compact encoding.
| Property | Value |
|---|---|
| Length field | UNSIGNED_VARINT |
| Length semantics | Actual length + 1 |
| Maximum length | 2³²-2 bytes |
| Null handling | Not nullable |
Wire Format:
COMPACT_STRING => length:UNSIGNED_VARINT data:BYTES[length-1]
The length field encodes actual_length + 1, so:
- Length value 1 = empty string (0 bytes)
- Length value 2 = 1-byte string
- Length value N = (N-1)-byte string
Encoding Example:
| String | Wire Length | Encoded Bytes (hex) |
|---|---|---|
| "" | 1 | 01 |
| "a" | 2 | 02 61 |
| "hello" | 6 | 06 68 65 6C 6C 6F |
Null Constraint
A length value of 0 must be rejected as invalid for non-nullable COMPACT_STRING.
COMPACT_NULLABLE_STRING¶
A variable-length prefixed UTF-8 string using compact encoding that may be null.
| Property | Value |
|---|---|
| Length field | UNSIGNED_VARINT |
| Null indicator | 0 |
| Maximum length | 2³²-2 bytes |
Wire Format:
COMPACT_NULLABLE_STRING => length:UNSIGNED_VARINT [data:BYTES[length-1]]
If length == 0: null (no data bytes)
If length > 0: exactly 'length-1' data bytes follow
Encoding Example:
| String | Wire Length | Encoded Bytes (hex) |
|---|---|---|
| null | 0 | 00 |
| "" | 1 | 01 |
| "test" | 5 | 05 74 65 73 74 |
Bytes Types¶
BYTES (Non-Nullable)¶
A length-prefixed byte array that cannot be null.
| Property | Value |
|---|---|
| Length field | INT32 |
| Maximum length | 2,147,483,647 bytes |
| Null handling | Not nullable |
Wire Format:
BYTES => length:INT32 data:BYTE[length]
NULLABLE_BYTES¶
A length-prefixed byte array that may be null.
| Property | Value |
|---|---|
| Length field | INT32 |
| Maximum length | 2,147,483,647 bytes |
| Null indicator | -1 |
Wire Format:
NULLABLE_BYTES => length:INT32 [data:BYTE[length]]
If length == -1: null (no data bytes)
If length >= 0: exactly 'length' data bytes follow
COMPACT_BYTES (Non-Nullable)¶
A variable-length prefixed byte array using compact encoding.
| Property | Value |
|---|---|
| Length field | UNSIGNED_VARINT |
| Length semantics | Actual length + 1 |
| Maximum length | 2³²-2 bytes |
| Null handling | Not nullable |
Wire Format:
COMPACT_BYTES => length:UNSIGNED_VARINT data:BYTE[length-1]
COMPACT_NULLABLE_BYTES¶
A variable-length prefixed byte array using compact encoding that may be null.
| Property | Value |
|---|---|
| Length field | UNSIGNED_VARINT |
| Null indicator | 0 |
| Maximum length | 2³²-2 bytes |
Wire Format:
COMPACT_NULLABLE_BYTES => length:UNSIGNED_VARINT [data:BYTE[length-1]]
If length == 0: null (no data bytes)
If length > 0: exactly 'length-1' data bytes follow
Array Types¶
ARRAY¶
A count-prefixed array of elements that may be null.
| Property | Value |
|---|---|
| Count field | INT32 |
| Maximum elements | 2,147,483,647 |
| Null indicator | -1 |
Wire Format:
ARRAY<T> => count:INT32 [elements:T[count]]
If count == -1: null array (no elements)
If count >= 0: exactly 'count' elements follow
Empty vs Null
An empty array (count = 0) is semantically distinct from a null array (count = -1). Implementations must preserve this distinction.
COMPACT_ARRAY¶
A variable-length count-prefixed array using compact encoding.
| Property | Value |
|---|---|
| Count field | UNSIGNED_VARINT |
| Count semantics | Actual count + 1 |
| Null indicator | 0 |
Wire Format:
COMPACT_ARRAY<T> => count:UNSIGNED_VARINT [elements:T[count-1]]
If count == 0: null array (no elements)
If count > 0: exactly 'count-1' elements follow
Count Semantics:
| Wire Count | Meaning |
|---|---|
| 0 | Null array |
| 1 | Empty array (0 elements) |
| 2 | Array with 1 element |
| N | Array with N-1 elements |
Records Types¶
RECORDS¶
Message batch data in Kafka record format.
| Property | Value |
|---|---|
| Length field | INT32 |
| Null indicator | -1 |
| Content | One or more RecordBatch structures |
Wire Format:
RECORDS => length:INT32 [data:BYTE[length]]
If length == -1: null (no record data)
If length >= 0: exactly 'length' bytes of record batch data
COMPACT_RECORDS¶
Message batch data using compact encoding.
| Property | Value |
|---|---|
| Length field | UNSIGNED_VARINT |
| Null indicator | 0 |
| Content | One or more RecordBatch structures |
Wire Format:
COMPACT_RECORDS => length:UNSIGNED_VARINT [data:BYTE[length-1]]
If length == 0: null (no record data)
If length > 0: exactly 'length-1' bytes of record batch data
See Protocol Records for the record batch format specification.
Tagged Fields¶
Introduced in Kafka 2.4 (KIP-482), tagged fields enable forward-compatible protocol evolution without incrementing API versions.
Structure¶
TaggedFields => num_fields:UNSIGNED_VARINT [field:TaggedField]*
TaggedField => tag:UNSIGNED_VARINT size:UNSIGNED_VARINT data:BYTES[size]
| Field | Type | Description |
|---|---|---|
num_fields |
UNSIGNED_VARINT | Number of tagged fields |
tag |
UNSIGNED_VARINT | Unique field identifier |
size |
UNSIGNED_VARINT | Size of field data in bytes |
data |
BYTES | Raw field data |
Behavioral Requirements¶
| Requirement | Level | Description |
|---|---|---|
| Tag ordering | must | Tagged fields must be serialized in strictly ascending tag order |
| Unknown tags | must | Implementations must ignore (skip) unknown tagged fields |
| Duplicate tags | must | Implementations must reject duplicate tags as a protocol error |
| Tag value range | must | Tag values must be non-negative |
Version Applicability¶
Tagged fields are only valid in "flexible" API versions:
| API Version Type | Request Tagged Fields | Response Tagged Fields |
|---|---|---|
| Non-flexible | ❌ | ❌ |
| Flexible | ✅ | ✅ |
Flexible Version Detection
Each API specifies which versions are "flexible" in its schema. Flexible versions use compact encodings (COMPACT_STRING, COMPACT_ARRAY, etc.) and include tagged field sections.
Type Summary¶
Primitive Types¶
| Type | Size | Nullable | Compact Variant |
|---|---|---|---|
| INT8 | 1 | ❌ | - |
| INT16 | 2 | ❌ | - |
| INT32 | 4 | ❌ | - |
| INT64 | 8 | ❌ | - |
| UINT16 | 2 | ❌ | - |
| UINT32 | 4 | ❌ | - |
| VARINT | 1-5 | ❌ | - |
| VARLONG | 1-10 | ❌ | - |
| UNSIGNED_VARINT | 1-5 | ❌ | - |
| FLOAT64 | 8 | ❌ | - |
| UUID | 16 | ❌ | - |
| BOOLEAN | 1 | ❌ | - |
Complex Types¶
| Type | Length Field | Nullable | Compact Variant |
|---|---|---|---|
| STRING | INT16 | ❌ | COMPACT_STRING |
| NULLABLE_STRING | INT16 | ✅ | COMPACT_NULLABLE_STRING |
| BYTES | INT32 | ❌ | COMPACT_BYTES |
| NULLABLE_BYTES | INT32 | ✅ | COMPACT_NULLABLE_BYTES |
| ARRAY | INT32 | ✅ | COMPACT_ARRAY |
| RECORDS | INT32 | ✅ | COMPACT_RECORDS |
Implementation Notes¶
Byte Order¶
All multi-byte primitive types use big-endian (network) byte order. This applies to:
- Fixed-width integers (INT16, INT32, INT64, UINT16, UINT32)
- Floating point (FLOAT64)
- UUID
Variable-length integers (VARINT, VARLONG, UNSIGNED_VARINT) use little-endian data bit ordering within their variable-length encoding.
String Encoding¶
All string types use UTF-8 encoding. Implementations must:
- Encode strings as valid UTF-8
- Accept and preserve valid UTF-8 sequences
- Handle invalid UTF-8 as implementation-defined (may reject or replace)
Memory Considerations¶
| Concern | Recommendation |
|---|---|
| Maximum message size | Validate against configured limits before allocation |
| Array allocation | Check count bounds before allocating arrays |
| String length | Validate length does not exceed available bytes |
Related Documentation¶
- Protocol Messages - Message framing and headers
- Protocol Records - Record batch format
- Protocol Errors - Error code reference
- Kafka Protocol - Protocol overview