4.4 Schema Evolution
Backward compatibility and schema evolution examples.
Video Coming Soon
Schema Evolution: Backward Compatibility
Overview
Schema evolution is inevitable in data systems. This lesson focuses on backward compatibility, ensuring older consumers can read messages produced with newer schema versions.
Avro Basics
Why Avro?
- Compact binary serialization: Much smaller than JSON or XML
- Schema Registry support: Centralized schema management
- Big data optimized: Designed for distributed systems
Setting Up Avro CLI
Download and verify Avro tools:
1wget https://downloads.apache.org/avro/avro-1.11.3/java/avro-tools-1.11.3.jar
2java -jar avro-tools-1.11.3.jarAvro Schema Example
Payment Schema (V1)
1{
2 "type": "record",
3 "name": "Payment",
4 "fields": [
5 {"name": "user_id", "type": "int"},
6 {"name": "amount", "type": "float"},
7 {"name": "currency", "type": "string"}
8 ]
9}Serialization Process
Create Sample Data
1echo '{"user_id":1,"amount":100.0,"currency":"USD"}' > payment.jsonSerialize to Avro
1java -jar avro-tools-1.11.3.jar fromjson \\
2 --schema-file payment.avsc \\
3 payment.json > payment.avroInspect Binary
1hexdump -C payment.avroDeserialization Process
Convert Avro back to JSON:
1java -jar avro-tools-1.11.3.jar tojson \\
2 --reader-schema-file payment.avsc \\
3 payment.avroOutput:
1{"user_id":1,"amount":100.0,"currency":"USD"}Avro with Kafka
How It Works
- Define Schema: Create and register with Schema Registry
- Producer Sends: Includes schema ID (not full schema)
- Message Format:
- Magic byte (Avro serialization indicator)
- Schema ID (reference to registry)
- Serialized data
- Broker Stores: Passes message without schema interaction
- Consumer Receives: Fetches schema via ID, deserializes data
Benefits
- Lightweight messages: Schema not included in every message
- Centralized management: Schema Registry ensures consistency
- Version control: Automatic schema versioning
- Validation: Schema enforcement at write time
Backward Compatibility in Practice
Scenario
- Initial: Producer and consumer use Schema V1
- Update: Producer upgrades to Schema V2 (adds optional field)
- Compatibility: Consumer on V1 can still read V2 messages
Example Evolution
V1:
1{
2 "type": "record",
3 "name": "Payment",
4 "fields": [
5 {"name": "user_id", "type": "int"},
6 {"name": "amount", "type": "float"}
7 ]
8}V2 (Backward Compatible):
1{
2 "type": "record",
3 "name": "Payment",
4 "fields": [
5 {"name": "user_id", "type": "int"},
6 {"name": "amount", "type": "float"},
7 {"name": "currency", "type": ["null", "string"], "default": null}
8 ]
9}V1 consumers can read V2 messages because the new field is optional with a default value.
CLI vs Kafka Comparison
| Aspect | Avro CLI | Kafka with Schema Registry |
|--------|----------|----------------------------|
| Schema Storage | In file | Centralized registry |
| Message Size | Includes schema | Schema ID only |
| Validation | Manual | Automatic |
| Versioning | Manual | Automatic |
| Best For | Testing/Learning | Production |
Best Practices
Schema Design
- Always use optional fields for new additions
- Provide sensible defaults
- Document changes thoroughly
- Test compatibility before deployment
Evolution Strategy
- Add fields as optional
- Never remove required fields
- Use Schema Registry's compatibility checker
- Maintain version history
Testing
- Test with Avro CLI first
- Verify compatibility in Schema Registry
- Test with actual consumers
- Monitor for deserialization errors
Summary
Avro provides:
- Efficient serialization
- Strong schema support
- Backward compatibility mechanisms
- Integration with Kafka ecosystem
Understanding Avro and schema evolution ensures smooth system upgrades without breaking consumers.