Schema Evolution: Backward Compatibility

Overview

Schema evolution is inevitable in data systems. This lesson focuses on backward compatibility, ensuring older consumers can read messages produced with newer schema versions.

Avro Basics

Why Avro?

Compact binary serialization: Much smaller than JSON or XML
Schema Registry support: Centralized schema management
Big data optimized: Designed for distributed systems

Setting Up Avro CLI

Download and verify Avro tools:

bash

1wget https://downloads.apache.org/avro/avro-1.11.3/java/avro-tools-1.11.3.jar
2java -jar avro-tools-1.11.3.jar

Avro Schema Example

Payment Schema (V1)

json

1{
2  "type": "record",
3  "name": "Payment",
4  "fields": [
5    {"name": "user_id", "type": "int"},
6    {"name": "amount", "type": "float"},
7    {"name": "currency", "type": "string"}
8  ]
9}

Serialization Process

Create Sample Data

bash

1echo '{"user_id":1,"amount":100.0,"currency":"USD"}' > payment.json

Serialize to Avro

bash

1java -jar avro-tools-1.11.3.jar fromjson \\
2  --schema-file payment.avsc \\
3  payment.json > payment.avro

Inspect Binary

bash

1hexdump -C payment.avro

Deserialization Process

Convert Avro back to JSON:

bash

1java -jar avro-tools-1.11.3.jar tojson \\
2  --reader-schema-file payment.avsc \\
3  payment.avro

Output:

json

1{"user_id":1,"amount":100.0,"currency":"USD"}

Avro with Kafka

How It Works

Define Schema: Create and register with Schema Registry
Producer Sends: Includes schema ID (not full schema)
Message Format:
- Magic byte (Avro serialization indicator)
- Schema ID (reference to registry)
- Serialized data
Broker Stores: Passes message without schema interaction
Consumer Receives: Fetches schema via ID, deserializes data
Benefits
- Lightweight messages: Schema not included in every message
- Centralized management: Schema Registry ensures consistency
- Version control: Automatic schema versioning
- Validation: Schema enforcement at write time

Backward Compatibility in Practice

Scenario

Initial: Producer and consumer use Schema V1
Update: Producer upgrades to Schema V2 (adds optional field)
Compatibility: Consumer on V1 can still read V2 messages

Example Evolution

V1:

json

1{
2  "type": "record",
3  "name": "Payment",
4  "fields": [
5    {"name": "user_id", "type": "int"},
6    {"name": "amount", "type": "float"}
7  ]
8}

V2 (Backward Compatible):

json

1{
2  "type": "record",
3  "name": "Payment",
4  "fields": [
5    {"name": "user_id", "type": "int"},
6    {"name": "amount", "type": "float"},
7    {"name": "currency", "type": ["null", "string"], "default": null}
8  ]
9}

V1 consumers can read V2 messages because the new field is optional with a default value.

CLI vs Kafka Comparison

| Aspect | Avro CLI | Kafka with Schema Registry |

|--------|----------|----------------------------|

| Schema Storage | In file | Centralized registry |

| Message Size | Includes schema | Schema ID only |

| Validation | Manual | Automatic |

| Versioning | Manual | Automatic |

| Best For | Testing/Learning | Production |

Best Practices

Schema Design

Always use optional fields for new additions
Provide sensible defaults
Document changes thoroughly
Test compatibility before deployment

Evolution Strategy

Add fields as optional
Never remove required fields
Use Schema Registry's compatibility checker
Maintain version history

Testing

Test with Avro CLI first
Verify compatibility in Schema Registry
Test with actual consumers
Monitor for deserialization errors

Summary

Avro provides:

Efficient serialization
Strong schema support
Backward compatibility mechanisms
Integration with Kafka ecosystem

Understanding Avro and schema evolution ensures smooth system upgrades without breaking consumers.

CentralMesh.io

4.4 Schema Evolution

Schema Evolution: Backward Compatibility

Overview

Avro Basics

Why Avro?

Setting Up Avro CLI

Avro Schema Example

Payment Schema (V1)

Serialization Process

Create Sample Data

Serialize to Avro

Inspect Binary

Deserialization Process

Avro with Kafka

How It Works

Benefits

Backward Compatibility in Practice

Scenario

Example Evolution

CLI vs Kafka Comparison

Best Practices

Schema Design

Evolution Strategy

Testing

Summary