The most surprising thing about enforcing event schemas is that you’re not actually enforcing them everywhere – you’re enforcing them at the seams of your system, where data moves between producers and consumers.
Let’s say you have a microservice that tracks user signups. It produces an UserSignedUp event.
{
"userId": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
"timestamp": "2023-10-27T10:00:00Z",
"email": "test@example.com",
"plan": "free"
}
Another service, perhaps for sending welcome emails, consumes this event. Without a schema registry, the email service might have been written assuming a plan field exists, but what if the signup service later adds a signupSource field and removes plan? Your email service breaks.
A schema registry acts as the single source of truth for your event structures. Producers register their schemas, and consumers fetch them to ensure compatibility. This is typically done using a format like Avro, Protobuf, or JSON Schema. Let’s look at Avro, which is popular for its compact binary format and strong schema evolution capabilities.
Here’s a simplified Avro schema for our UserSignedUp event:
{
"type": "record",
"name": "UserSignedUp",
"namespace": "com.example.events",
"fields": [
{"name": "userId", "type": "string"},
{"name": "timestamp", "type": "long", "logicalType": "timestamp-millis"},
{"name": "email", "type": "string"},
{"name": "plan", "type": "string"}
]
}
When the UserSignedUp event is produced, the producer library, aware of the schema, serializes the event into Avro’s binary format. Crucially, it also includes a schema ID that references the specific version of the schema used for serialization. The Kafka broker (or whatever message bus you’re using) stores this binary payload along with the schema ID.
The consumer, when it receives the event, first reads the schema ID. It then queries the schema registry to retrieve the corresponding schema. Using this schema and the Avro specification, the consumer deserializes the binary payload into a usable object. If the producer later changes the schema (e.g., adds a signupSource field), it registers the new version. The schema registry, with its compatibility rules, will determine if this new schema is backward-compatible (consumers using the old schema can still read data produced with the new schema) or forward-compatible (consumers using the new schema can read data produced with the old schema).
Confluent Schema Registry, a widely adopted implementation, uses an HTTP API. To register a schema, you might POST to an endpoint like /subjects/UserSignedUp-value/versions with a JSON payload containing your schema definition.
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"schema": "{\"type\": \"record\", \"name\": \"UserSignedUp\", \"namespace\": \"com.example.events\", \"fields\": [...] }"} \
http://localhost:8081/subjects/UserSignedUp-value/versions
The registry responds with a JSON object containing the assigned schema ID and version.
{"id": 42, "version": 1}
Producers and consumers then use this ID to fetch schemas or validate data. For example, a producer might check if a new schema is compatible with existing ones before registering it:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"schema": "{\"type\": \"record\", \"name\": \"UserSignedUp\", \"namespace\": \"com.example.events\", \"fields\": [..., {\"name\": \"signupSource\", \"type\": \"string\", \"default\": \"unknown\"}] }", "compatibility": "BACKWARD"}' \
http://localhost:8081/subjects/UserSignedUp-value/versions?normalize=true
The compatibility parameter is key. Common settings include BACKWARD (new schema can read old data), FORWARD (old schema can read new data), FULL (both ways), or NONE (no checks).
The most overlooked aspect of schema registries is how they manage schema evolution. When you add a new field, it’s not just about appending it. If you want backward compatibility (the most common requirement), the new field must have a default value defined in the schema. Without a default, consumers using the old schema won’t know how to interpret the new data and will likely fail deserialization. For Avro, this looks like adding "default": "some_default_value" to the field definition.
This system allows you to confidently evolve your event schemas, ensuring that your producers and consumers can communicate even as your system changes.
The next challenge you’ll face is managing schema evolution across multiple event types and services, often leading to discussions about canonical schemas and event governance.