CentralMesh.io

Kafka Fundamentals for Beginners
AdSense Banner (728x90)

4.4 Schema Evolution

Backward compatibility and schema evolution examples.

Video Coming Soon

Schema Evolution: Backward Compatibility

Overview

Schema evolution is inevitable in data systems. This lesson focuses on backward compatibility, ensuring older consumers can read messages produced with newer schema versions.

Avro Basics

Why Avro?

  • Compact binary serialization: Much smaller than JSON or XML
  • Schema Registry support: Centralized schema management
  • Big data optimized: Designed for distributed systems

Setting Up Avro CLI

Download and verify Avro tools:

bash
1wget https://downloads.apache.org/avro/avro-1.11.3/java/avro-tools-1.11.3.jar
2java -jar avro-tools-1.11.3.jar

Avro Schema Example

Payment Schema (V1)

json
1{
2  "type": "record",
3  "name": "Payment",
4  "fields": [
5    {"name": "user_id", "type": "int"},
6    {"name": "amount", "type": "float"},
7    {"name": "currency", "type": "string"}
8  ]
9}

Serialization Process

Create Sample Data

bash
1echo '{"user_id":1,"amount":100.0,"currency":"USD"}' > payment.json

Serialize to Avro

bash
1java -jar avro-tools-1.11.3.jar fromjson \\
2  --schema-file payment.avsc \\
3  payment.json > payment.avro

Inspect Binary

bash
1hexdump -C payment.avro

Deserialization Process

Convert Avro back to JSON:

bash
1java -jar avro-tools-1.11.3.jar tojson \\
2  --reader-schema-file payment.avsc \\
3  payment.avro

Output:

json
1{"user_id":1,"amount":100.0,"currency":"USD"}

Avro with Kafka

How It Works

  1. Define Schema: Create and register with Schema Registry
  2. Producer Sends: Includes schema ID (not full schema)
  3. Message Format:

    - Magic byte (Avro serialization indicator)

    - Schema ID (reference to registry)

    - Serialized data

  4. Broker Stores: Passes message without schema interaction
  5. Consumer Receives: Fetches schema via ID, deserializes data

    Benefits

    • Lightweight messages: Schema not included in every message
    • Centralized management: Schema Registry ensures consistency
    • Version control: Automatic schema versioning
    • Validation: Schema enforcement at write time

Backward Compatibility in Practice

Scenario

  • Initial: Producer and consumer use Schema V1
  • Update: Producer upgrades to Schema V2 (adds optional field)
  • Compatibility: Consumer on V1 can still read V2 messages

Example Evolution

V1:

json
1{
2  "type": "record",
3  "name": "Payment",
4  "fields": [
5    {"name": "user_id", "type": "int"},
6    {"name": "amount", "type": "float"}
7  ]
8}

V2 (Backward Compatible):

json
1{
2  "type": "record",
3  "name": "Payment",
4  "fields": [
5    {"name": "user_id", "type": "int"},
6    {"name": "amount", "type": "float"},
7    {"name": "currency", "type": ["null", "string"], "default": null}
8  ]
9}

V1 consumers can read V2 messages because the new field is optional with a default value.

CLI vs Kafka Comparison

| Aspect | Avro CLI | Kafka with Schema Registry |

|--------|----------|----------------------------|

| Schema Storage | In file | Centralized registry |

| Message Size | Includes schema | Schema ID only |

| Validation | Manual | Automatic |

| Versioning | Manual | Automatic |

| Best For | Testing/Learning | Production |

Best Practices

Schema Design

  • Always use optional fields for new additions
  • Provide sensible defaults
  • Document changes thoroughly
  • Test compatibility before deployment

Evolution Strategy

  • Add fields as optional
  • Never remove required fields
  • Use Schema Registry's compatibility checker
  • Maintain version history

Testing

  • Test with Avro CLI first
  • Verify compatibility in Schema Registry
  • Test with actual consumers
  • Monitor for deserialization errors

Summary

Avro provides:

  • Efficient serialization
  • Strong schema support
  • Backward compatibility mechanisms
  • Integration with Kafka ecosystem

Understanding Avro and schema evolution ensures smooth system upgrades without breaking consumers.