Schema Evolution¶

Schema evolution allows schemas to change over time while maintaining compatibility with existing data and applications.

Evolution Concepts¶

Writer Schema vs Reader Schema¶

Schema Registry uses the concepts of writer schema (used when data was written) and reader schema (used when reading data):

Resolution Rules¶

When reader and writer schemas differ, serialization frameworks apply resolution rules:

Scenario	Resolution
Field in writer, not in reader	Field ignored
Field in reader with default, not in writer	Default value used
Field in reader without default, not in writer	Error
Type mismatch	Error (unless promotable)

Safe Schema Changes¶

Adding Optional Fields¶

Adding optional fields with defaults is safe for all compatibility modes:

Avro:

{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "id", "type": "long"},
    {"name": "name", "type": "string"},
    {"name": "email", "type": ["null", "string"], "default": null}
  ]
}

Protobuf:

message User {
  int64 id = 1;
  string name = 2;
  optional string email = 3;  // Added field
}

JSON Schema:

{
  "type": "object",
  "properties": {
    "id": {"type": "integer"},
    "name": {"type": "string"},
    "email": {"type": "string"}
  },
  "required": ["id", "name"]
}

Removing Optional Fields¶

Removing optional fields is safe when using FORWARD or FULL compatibility:

// Version 1
{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "id", "type": "long"},
    {"name": "name", "type": "string"},
    {"name": "nickname", "type": ["null", "string"], "default": null}
  ]
}

// Version 2 (nickname removed)
{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "id", "type": "long"},
    {"name": "name", "type": "string"}
  ]
}

Old consumers (v1) reading new data (v2) will use the default value for nickname.

Unsafe Schema Changes¶

Adding Required Fields¶

Adding a required field without a default breaks backward compatibility:

// Version 1
{
  "fields": [
    {"name": "id", "type": "long"}
  ]
}

// Version 2 - INCOMPATIBLE
{
  "fields": [
    {"name": "id", "type": "long"},
    {"name": "created_at", "type": "long"}  // No default!
  ]
}

Old data lacks created_at, so new consumers cannot deserialize it.

Changing Field Types¶

Type changes are incompatible:

// Version 1
{"name": "user_id", "type": "long"}

// Version 2 - INCOMPATIBLE
{"name": "user_id", "type": "string"}

Renaming Fields¶

Direct field renames are incompatible without aliases:

// Version 1
{"name": "user_name", "type": "string"}

// Version 2 - INCOMPATIBLE (without alias)
{"name": "username", "type": "string"}

Evolution Patterns¶

Pattern 1: Add with Default, Then Remove Default¶

For fields that should eventually be required:

// Step 1: Add optional field with default
{"name": "email", "type": ["null", "string"], "default": null}

// Wait for all producers to populate email

// Step 2: Remove default (field still optional)
{"name": "email", "type": ["null", "string"]}

// Wait for all historical data without email to expire

// Step 3: Make required (only if all data has email)
{"name": "email", "type": "string"}

Pattern 2: Deprecation Period¶

For field removal:

Pattern 3: Using Aliases¶

Rename fields safely with Avro aliases:

{
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "username",
      "type": "string",
      "aliases": ["user_name", "userName"]
    }
  ]
}

Old data with user_name will be read into username.

Pattern 4: Union Types for Flexibility¶

Use union types for fields that may have multiple formats:

{
  "name": "timestamp",
  "type": [
    "null",
    "long",
    {"type": "string", "logicalType": "timestamp-millis"}
  ],
  "default": null
}

Compatibility by Mode¶

BACKWARD Evolution¶

New schema must read old data:

Change	Allowed
Add optional field with default	✅
Add required field	❌
Remove field	✅
Widen type (int → long)	✅
Narrow type (long → int)	❌

Upgrade order: Consumers first, then producers.

FORWARD Evolution¶

Old schema must read new data:

Change	Allowed
Add field	✅
Remove optional field with default	✅
Remove required field	❌
Widen type (int → long)	❌
Narrow type (long → int)	✅

Upgrade order: Producers first, then consumers.

FULL Evolution¶

Both backward and forward compatible:

Change	Allowed
Add optional field with default	✅
Remove optional field with default	✅
Add required field	❌
Remove required field	❌
Change type	❌

Upgrade order: Any order.

Version Management¶

Schema Versions¶

Each schema registration creates a version:

# List versions
curl http://schema-registry:8081/subjects/users-value/versions
# [1, 2, 3]

# Get specific version
curl http://schema-registry:8081/subjects/users-value/versions/2

Schema IDs¶

Schema IDs are global and unique across all subjects:

Schema	Subject	Version	Global ID
User v1	users-value	1	1
Order v1	orders-value	1	2
User v2	users-value	2	3

Deleting Schemas¶

Soft delete removes schema from listing but preserves ID:

# Soft delete version
curl -X DELETE http://schema-registry:8081/subjects/users-value/versions/2

# Hard delete (after soft delete)
curl -X DELETE http://schema-registry:8081/subjects/users-value/versions/2?permanent=true

Schema Deletion

Deleting schemas can break consumers holding cached references. Only delete schemas when certain no data exists with that schema ID.

Testing Compatibility¶

Before Registration¶

# Test compatibility before registering
curl -X POST \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"schema": "{...}"}' \
  http://schema-registry:8081/compatibility/subjects/users-value/versions/latest

# Response
{"is_compatible": true}

CI/CD Integration¶

#!/bin/bash
# compatibility-check.sh

SCHEMA=$(cat schema.avsc | jq -Rs '.')
RESULT=$(curl -s -X POST \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data "{\"schema\": $SCHEMA}" \
  "$SCHEMA_REGISTRY_URL/compatibility/subjects/$SUBJECT/versions/latest")

if echo "$RESULT" | jq -e '.is_compatible == true' > /dev/null; then
  echo "Schema is compatible"
  exit 0
else
  echo "Schema is INCOMPATIBLE"
  echo "$RESULT" | jq '.messages'
  exit 1
fi

Best Practices¶

Practice	Rationale
Start with BACKWARD	Most common upgrade pattern (consumers first)
Always provide defaults	Enables safe field addition
Use optional fields	Allows both addition and removal
Avoid type changes	Types should be permanent
Document field semantics	Prevent misuse across versions
Use semantic versioning	Track breaking vs non-breaking changes
Test in staging	Validate compatibility before production

Schema Registry Overview - Architecture and concepts
Compatibility - Compatibility modes
Why Schemas - Schema management benefits
Operations - Production operations