---
theme: seriph
layout: default
title: Escaping Version Skew
info: |
## Escaping Version Skew
Compatibility, version skew, and what to do about it when rollouts are never instant.
class: demo-full-bleed
colorSchema: light
routerMode: hash
aspectRatio: 16/9
canvasWidth: 960
fonts:
sans: IBM Plex Sans
mono: IBM Plex Mono
serif: Source Serif 4
mdc: true
drawings:
persist: false
---
<NetworkHero :red-receive-every="10" :title-layout="true" :hidden-node-ids="['router', 'db']">
<div class="hero-title-copy">
<div class="hero-talk-title">
<span class="hero-title-line">Escaping Version Skew</span>
</div>
<div class="hero-talk-subtitle">Formalizing compatibility in a world of partial rollouts</div>
<div class="hero-talk-meta">Robbie Ostrow, Member of Technical Staff, OpenAI</div>
<div class="hero-talk-event">SRECon Americas 2026</div>
</div>
</NetworkHero>
---
<AudienceRolloutQuestion />
---
# A mixed fleet shared one cache
<IncidentSketch />
---
layout: center
---
<div class="incident-twist-slide">
<h1>Rollback increased errors</h1>
<p class="deck-quote mt-8">Old readers came back while bad cached data was still alive.</p>
</div>
---
layout: center
---
<div class="rollout-joke-setup">the secret to coordinating ordered rollouts at scale</div>
---
layout: center
---
<div class="emphasis-slide">
<div class="emphasis-word">give up</div>
</div>
---
layout: center
---
<div class="emphasis-slide">
<div class="emphasis-phrase emphasis-phrase-coral">don't rely on humans</div>
</div>
---
class: demo-full-bleed
---
<div class="simulator-slide-shell">
<SimulatorDeck
mode="transition"
start-state-id="s1"
:sequence="['s2', 's3']"
:step-delay-ms="1600"
:autoplay="false"
:emit-rate-per-sec="1.3"
:packet-speed-px-per-sec="78"
:initial-packet-count="4"
:initial-packet-spacing-px="220"
:minimum-packet-gap-px="220"
height="72vh"
:layout-scale="0.5"
:bare="true"
:show-state-chip="false"
/>
</div>
---
# Parseable is not enough
<div class="one-figure-slide pydantic-compat-example mt-8">
<p class="deck-quote">Transport compatibility can still admit states your logic cannot handle.</p>
<div class="deck-grid-2 mt-8">
<div class="law-card">
<h3>Grammar</h3>
<p>What can be decoded.</p>
</div>
<div class="law-card success">
<h3>Validation</h3>
<p>What your system is willing to accept.</p>
</div>
</div>
</div>
<div class="deck-callout mt-8">
<p class="deck-quote">If the logic depends on the rule, the rule belongs at the boundary.</p>
</div>
---
# Avoid optionalslop
<div class="deck-grid-2 optional-soup-layout mt-8">
<div class="deck-schema-box optionalslop-grotesque">
```proto
message UserProfile {
optional string display_name = 1;
optional string first_name = 2;
optional string last_name = 3;
optional string legacy_full_name = 4;
optional string avatar_url = 5;
optional string avatar_id = 6;
optional string locale = 7;
optional string timezone = 8;
optional bool email_verified = 9;
optional bool phone_verified = 10;
optional string phone_number = 11;
optional string backup_phone_number = 12;
optional string city = 13;
optional string region = 14;
optional string country = 15;
optional string legacy_metadata_json = 16;
}
```
</div>
<div class="fact-card boundary-card optionalslop-copy">
<div class="optionalslop-stamp">compatibility residue</div>
<div class="boundary-point">
<div class="boundary-point-title">One type gets weaker over time</div>
<div class="boundary-point-body">As old fields accumulate for compatibility, the shared proto stops expressing the real domain model and turns into "maybe this, maybe that".</div>
</div>
<div class="boundary-point">
<div class="boundary-point-title">Impossible states become routine</div>
<div class="boundary-point-body">Now business logic has to remember which subsets belong together, which are stale, and which combinations should never exist.</div>
</div>
</div>
</div>
---
# Strict contracts are better for ~~humans~~ agents
<div class="deck-grid-3 mt-8 agent-contract-grid">
<div v-click class="law-card success">
<h3>Smaller legal state space</h3>
<p>Fewer ambiguous shapes for an agent depend on.</p>
</div>
<div v-click class="law-card success">
<h3>Hidden assumptions become explicit</h3>
<p>Put the rule at the boundary so the agent does not have to recover it.</p>
</div>
<div v-click class="law-card success">
<h3>Crisper test oracle</h3>
<p>A strict contract allows an agent loop to quickly iterate upon correctness.</p>
</div>
</div>
<div v-click class="deck-callout mt-8">
<p class="deck-quote">Agentic workflows get safer when the boundary is narrow enough to make bad states impossible, not just unlikely.</p>
</div>
---
# Stop sharing types.
<div class="deck-grid-2 mt-10 writer-reader-principle subsumption-containment-grid">
<div class="law-card success">
<h3>Writers should be as strict as possible</h3>
<p>Emit today's contract, not a mushy superset shaped by every historical rollout.</p>
</div>
<div class="law-card success">
<h3>Readers should accept the union of the last few writers</h3>
<p>Carry compatibility in the reader, where skew actually lands.</p>
</div>
</div>
<div class="deck-callout mt-10">
<p class="deck-quote">Stop sharing types between client and server.</p>
</div>
---
# A strict writer, a union reader
<div class="deck-grid-2 mt-8">
<div class="one-figure-slide pydantic-compat-example">
```python
class UserProfileWriter(BaseModel):
name: str = Field(min_length=1)
age: int = Field(ge=0)
```
</div>
<div class="one-figure-slide pydantic-compat-example">
```python
type UserProfileReader =
| UserProfileV1Reader
| UserProfileV2Reader
| UserProfileV3Reader
match payload:
case UserProfileV3Reader(name=name, age=age):
...
case UserProfileV2Reader(full_name=full_name):
...
```
</div>
</div>
<div class="deck-callout mt-8">
<p class="deck-quote">New writes stay clean. Compatibility is quarantined to explicit old-version branches.</p>
</div>
---
# Stamp every payload with a writer version.
<div class="deck-grid-2 stamp-process-intro mt-6">
<div class="law-card success">
<h3>Writers stamp the shape they emitted</h3>
</div>
<div class="law-card success">
<h3>Readers branch on the stamp, not on custom logic</h3>
</div>
</div>
<div class="tooling-checklist tooling-checklist-compact stamp-process-checklist mt-6">
<div v-click class="tooling-step"><strong>1</strong><span>Update the schema.</span></div>
<div v-click class="tooling-step"><strong>2</strong><span>Detect breaking changes.</span></div>
<div v-click class="tooling-step"><strong>3</strong><span>Keep the writer as strict as possible.</span></div>
<div v-click class="tooling-step"><strong>4</strong><span>Make readers a tagged union of the last few writers.</span></div>
<div v-click class="tooling-step"><strong>5</strong><span>Measure how often old writer branches still deserialize.</span></div>
<div v-click class="tooling-step"><strong>6</strong><span>Delete old branches once those metrics hit zero.</span></div>
</div>
---
class: demo-full-bleed
---
<div class="simulator-slide-shell">
<SimulatorDeck
mode="transition"
start-state-id="s6"
:sequence="['s7', 's8', 's9']"
:step-delay-ms="1600"
:autoplay="false"
:pause-at-end="true"
:emit-rate-per-sec="1.1"
:packet-speed-px-per-sec="78"
:initial-packet-count="3"
:initial-packet-spacing-px="220"
:minimum-packet-gap-px="220"
height="72vh"
:layout-scale="0.5"
:bare="true"
:show-state-chip="false"
/>
</div>
---
layout: center
---
<div class="emphasis-slide">
<div class="emphasis-phrase">Tooling!</div>
<div class="hero-talk-subtitle mt-4">Prove when possible. Fuzz when not.</div>
</div>
---
class: demo-full-bleed
---
<CheckerEmbed />
---
# A subsumption checker asks set containment
<div class="deck-grid-2 mt-4 writer-reader-principle">
<div class="law-card success">
<h3>New writer safe for old reader</h3>
<p>L(new) ⊆ L(old)</p>
</div>
<div class="law-card success">
<h3>Old writer safe for new reader</h3>
<p>L(old) ⊆ L(new)</p>
</div>
</div>
<div class="deck-callout mt-2">
<p class="deck-quote">A schema change is compatible in a direction exactly when every value accepted before is still accepted after, or vice versa.</p>
</div>
<div class="assumption-footnote mt-3">
Serializer assumption: no extra emitted fields beyond the declared schema.
</div>
---
# Two passes: prove, then search
<div class="deck-grid-2 mt-10 writer-reader-principle subsumption-containment-grid">
<div class="law-card success">
<h3>Static checker</h3>
<p>Fast, deterministic proofs for the common cases.</p>
</div>
<div class="law-card success">
<h3>Fuzzer</h3>
<p>Concrete counterexamples when the schema is too expressive for a complete proof.</p>
</div>
</div>
<div class="tooling-checklist tooling-checklist-compact mt-8">
<div class="tooling-step"><strong>1</strong><span>Try to prove set containment from the schemas alone.</span></div>
<div class="tooling-step"><strong>2</strong><span>If the proof is incomplete, search for a witness value.</span></div>
<div class="tooling-step"><strong>3</strong><span>Use the witness to make the breakage obvious to humans and agents.</span></div>
</div>
---
# A concrete witness makes breakage obvious
<div class="witness-slide-shell mt-5">
<div class="witness-schema-panel witness-schema-old">
<div class="witness-label">Old schema</div>
```json {all|10}
"if": { "properties": { "mode": { "const": "percent" } } },
"then": {
"properties": {
"value": { "maximum": 100 }
}
}
```
</div>
<div class="witness-change-rail">
<div class="witness-arrow">→</div>
<div class="witness-change-copy">one keyword tightens</div>
</div>
<div class="witness-schema-panel witness-schema-new">
<div class="witness-label">New schema</div>
```json {all|4}
"if": { "properties": { "mode": { "const": "percent" } } },
"then": {
"properties": {
"value": { "exclusiveMaximum": 100 }
}
}
```
</div>
</div>
<div class="witness-result mt-6">
<div class="witness-result-kicker">Witness</div>
<code>{"mode":"percent","value":100}</code>
<div class="witness-result-copy">Valid before. Rejected after. </div>
</div>
---
class: demo-full-bleed
---
<CheckerEmbed />
---
# Make compatibility checks live next to the type
<div class="one-figure-slide pydantic-compat-example mt-8">
```python
from pydantic import BaseModel, Field
@jsoncompat_check(direction="both", stable_id="user-profile")
class UserProfile(BaseModel):
name: str = Field(min_length=1)
age: int = Field(ge=0)
```
</div>
<div class="deck-callout mt-8">
<p class="deck-quote">The stable ID ties this model to its historical schema snapshots, and CI checks both rollout directions on every change.</p>
</div>
---
# Adopt it in phases
<div class="tooling-checklist">
<div class="tooling-step"><strong>1</strong><span>Start by annotating storage-boundary types and checking both rollout directions in CI.</span></div>
<div class="tooling-step"><strong>2</strong><span>Add writer-version stamps and measure which old branches are still being read.</span></div>
<div class="tooling-step"><strong>3</strong><span>Split strict writer types from union reader types on the boundaries that matter most.</span></div>
</div>
<div class="deck-callout mt-4">
<p class="deck-quote">You do not need the whole end-state on day one to start catching real breakages.</p>
</div>
---
# When not to do this
<div class="deck-grid-2 mt-8">
<div class="law-card">
<h3>Probably not worth it</h3>
<p>Ephemeral internal RPCs with no durable state, no queues, and no meaningful rollback tail.</p>
</div>
<div class="law-card success">
<h3>Absolutely worth it</h3>
<p>Caches, queues, databases, durable workflows, mobile or external clients, and any boundary where state outlives binary.</p>
</div>
</div>
<div class="deck-callout mt-8">
<p class="deck-quote">Use the heavy machinery where old code and new state can meet. That is where version skew turns into incidents.</p>
</div>
---
# Constrain. Split. Gate. Observe.
<div class="deck-grid-2 mt-8 sre-playbook-grid">
<div class="law-card good">
<h3>Constrain</h3>
<p>Make strict schemas a cultural default: hidden assumptions should become contract rules, not tribal knowledge.</p>
</div>
<div class="law-card good">
<h3>Split</h3>
<p>Generate reader and writer types in your language of choice from the schema, and make historical unions cheap to maintain.</p>
</div>
<div class="law-card good">
<h3>Gate</h3>
<p>Run CI against the schema itself and against previous versions, detect breakages mechanically, and fail unsafe changes before merge.</p>
</div>
<div class="law-card good">
<h3>Observe</h3>
<p>Measure deserializations by payload version so you can see old tails, rollback risk, and when a branch is really gone.</p>
</div>
</div>
---
layout: center
---
<div class="thanks-slide">
<div class="thanks-title">Questions?</div>
<a class="thanks-link" href="https://jsoncompat.com">slides and tooling at jsoncompat.com</a>
</div>