jsoncompat 0.3.1

JSON Schema Compatibility Checker
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
---
theme: seriph
layout: default
title: Escaping Version Skew
info: |
  ## Escaping Version Skew

  Compatibility, version skew, and what to do about it when rollouts are never instant.
class: demo-full-bleed
colorSchema: light
routerMode: hash
aspectRatio: 16/9
canvasWidth: 960
fonts:
  sans: IBM Plex Sans
  mono: IBM Plex Mono
  serif: Source Serif 4
mdc: true
drawings:
  persist: false
---

<NetworkHero :red-receive-every="10" :title-layout="true" :hidden-node-ids="['router', 'db']">
  <div class="hero-title-copy">
    <div class="hero-talk-title">
      <span class="hero-title-line">Escaping Version Skew</span>
    </div>
    <div class="hero-talk-subtitle">Formalizing compatibility in a world of partial rollouts</div>
    <div class="hero-talk-meta">Robbie Ostrow, Member of Technical Staff, OpenAI</div>
    <div class="hero-talk-event">SRECon Americas 2026</div>
  </div>
</NetworkHero>

<!--
- Structured data is constantly flowing between systems: edge, APIs, internal services, storage, and back again.
- In the steady state, this usually works well enough that it is easy to forget how much implicit coordination is involved.
- This talk is about defining better boundaries between systems so we can detect and prevent breaking changes automatically.
- The goal is strict contracts at abstraction boundaries without relying on humans to catch subtle compatibility bugs.
-->

---

<AudienceRolloutQuestion />

<!--
- A deploy starts and errors begin rising.
- The natural instinct is to halt the deploy and roll back.
- That is exactly what we did in this incident.
-->

---

# A mixed fleet shared one cache

<IncidentSketch />

<!--
- We had a load-bearing auth cache in Redis shared by a mixed fleet.
- Pods on the new version wrote a format that old pods could not parse.
- A request that hit a new pod and then later hit an old pod could fail on cache read.
- New pods could read both formats, but old pods could only read the old one.
- This caused up to a 15% error rate for ChatGPT for about 30 minutes, until the cache expired.
- A longer TTL would have made recovery much riskier.
-->

---
layout: center
---

<div class="incident-twist-slide">
  <h1>Rollback increased errors</h1>
  <p class="deck-quote mt-8">Old readers came back while bad cached data was still alive.</p>
</div>

<!--
- In this case, continuing the rollout would have been safer than rolling back.
- Once every pod was on the new version, all cache entries would have been readable again.
- We only understood that after the rollback had completed, so the safest recovery was to wait for expiry.
- This is not an argument against rollback; it is an example of how time makes systems harder to reason about.
- Humans, agents, and tests tend to reason about one version at one point in time, but real fleets are mixed-version systems.
- We need to reason about current, previous, and sometimes future versions at the same time.
- The systems we maintain most often break when they change, so we need a better theory of change.
-->

---
layout: center
---

<div class="rollout-joke-setup">the secret to coordinating ordered rollouts at scale</div>

<!--
- Breaking changes are often discussed as a client/server rollout ordering problem.
- The usual pattern is to expand what the server accepts, update the client, and then maybe re-constrain the server later.
- In practice, that last cleanup step is easy to forget, and dead compatibility code accumulates.
- This approach also depends on humans rolling services out in the right order.
-->

---
layout: center
---

<div class="emphasis-slide">
  <div class="emphasis-word">give up</div>
</div>

<!--
- Give up on manually coordinating ordered rollouts as a correctness strategy.
- Do not give up on correctness itself.
- If a change in trunk only works when services deploy in a specific order, it is too fragile.
-->

---
layout: center
---

<div class="emphasis-slide">
  <div class="emphasis-phrase emphasis-phrase-coral">don't rely on humans</div>
</div>

<!--
- Do not rely on humans or agents to catch breaking changes in an unconstrained system.
- Manual rollout ordering is hard to reason about, makes rollbacks unsafe, and can be impossible when dependencies are circular.
- We need better constraints at abstraction boundaries and better tooling to enforce them.
-->

---
class: demo-full-bleed
---

<div class="simulator-slide-shell">
  <SimulatorDeck
    mode="transition"
    start-state-id="s1"
    :sequence="['s2', 's3']"
    :step-delay-ms="1600"
    :autoplay="false"
    :emit-rate-per-sec="1.3"
    :packet-speed-px-per-sec="78"
    :initial-packet-count="4"
    :initial-packet-spacing-px="220"
    :minimum-packet-gap-px="220"
    height="72vh"
    :layout-scale="0.5"
    :bare="true"
    :show-state-chip="false"
  />
</div>

<!--
- Any durable or in-flight state adds a time dimension to schema changes: caches, queues, databases, and RPCs all matter.
- A static architecture diagram hides the fact that old and new versions overlap during rollout.
- In this example, we add `eye_color` and want to make it required for all future users.
- If the reader rolls out first, it starts rejecting payloads from the old writer.
- Even if readers and writers flipped at exactly the same time, in-flight requests could still fail.
- Errors only stop after both sides have been on the same version long enough for queues and RPCs to drain.
-->

---

# Parseable is not enough

<div class="one-figure-slide pydantic-compat-example mt-8">
  <p class="deck-quote">Transport compatibility can still admit states your logic cannot handle.</p>
  <div class="deck-grid-2 mt-8">
    <div class="law-card">
      <h3>Grammar</h3>
      <p>What can be decoded.</p>
    </div>
    <div class="law-card success">
      <h3>Validation</h3>
      <p>What your system is willing to accept.</p>
    </div>
  </div>
</div>

<div class="deck-callout mt-8">
  <p class="deck-quote">If the logic depends on the rule, the rule belongs at the boundary.</p>
</div>

<!--
- Protobufs solve a lot of wire compatibility problems, but they do so by weakening the set of states the schema can represent.
- Optional-heavy schemas make compatibility easier by pushing constraints out of the contract and into application code.
- If an invariant matters to business logic, it should be enforced at the boundary.
- Application code should not have to handle every historical state forever.
- Schema contracts should encode richer rules than parseability alone, whether through JSON Schema or validation layers like Protovalidate.
- Stricter schemas make breaking changes easier to introduce, so we need tooling to manage that safely.
-->

---

# Avoid optionalslop

<div class="deck-grid-2 optional-soup-layout mt-8">
  <div class="deck-schema-box optionalslop-grotesque">

```proto
message UserProfile {
  optional string display_name = 1;
  optional string first_name = 2;
  optional string last_name = 3;
  optional string legacy_full_name = 4;
  optional string avatar_url = 5;
  optional string avatar_id = 6;
  optional string locale = 7;
  optional string timezone = 8;
  optional bool email_verified = 9;
  optional bool phone_verified = 10;
  optional string phone_number = 11;
  optional string backup_phone_number = 12;
  optional string city = 13;
  optional string region = 14;
  optional string country = 15;
  optional string legacy_metadata_json = 16;
}
```

  </div>
  <div class="fact-card boundary-card optionalslop-copy">
    <div class="optionalslop-stamp">compatibility residue</div>
    <div class="boundary-point">
      <div class="boundary-point-title">One type gets weaker over time</div>
      <div class="boundary-point-body">As old fields accumulate for compatibility, the shared proto stops expressing the real domain model and turns into "maybe this, maybe that".</div>
    </div>
    <div class="boundary-point">
      <div class="boundary-point-title">Impossible states become routine</div>
      <div class="boundary-point-body">Now business logic has to remember which subsets belong together, which are stale, and which combinations should never exist.</div>
    </div>
  </div>
</div>

<!--
- Current tooling encourages what I call optionalslop.
- Every migration, rollback path, and compatibility tail leaves residue in the schema.
- The result is a type that stays wire-compatible but gets worse at expressing which states are valid today.
- That pushes cleanup and correctness into business logic everywhere the type is read.
- If the contract allows impossible combinations, a future developer will eventually send one.
- This is not an argument against protobufs; it is an argument for pairing a wire format with stronger boundary rules.
- The boundary should be written in a schema definition language that can generate code without sneaking in extra assumptions.
-->

---

# Strict contracts are better for ~~humans~~ agents

<div class="deck-grid-3 mt-8 agent-contract-grid">
  <div v-click class="law-card success">
    <h3>Smaller legal state space</h3>
    <p>Fewer ambiguous shapes for an agent depend on.</p>
  </div>
  <div v-click class="law-card success">
    <h3>Hidden assumptions become explicit</h3>
    <p>Put the rule at the boundary so the agent does not have to recover it.</p>
  </div>
  <div v-click class="law-card success">
    <h3>Crisper test oracle</h3>
    <p>A strict contract allows an agent loop to quickly iterate upon correctness.</p>
  </div>
</div>

<div v-click class="deck-callout mt-8">
  <p class="deck-quote">Agentic workflows get safer when the boundary is narrow enough to make bad states impossible, not just unlikely.</p>
</div>

<!--
- Stronger models do not remove the need for strict contracts; they make strict contracts more important.
- Agents, like humans, can build systems too large to fully understand at once.
- As models get stronger, abstraction boundaries can get larger, but the systems behind them will also get larger.
- We still need contracts that fit inside the reasoning budget of the agent or human changing the system.
- Put as many constraints into the contract as possible, especially when business logic depends on them.
- Strict contracts reduce the legal state space, make hidden assumptions explicit, and create a sharper oracle for CI, review, and agentic loops.
- Do not make developers or agents re-derive implicit contracts from surrounding code and tribal knowledge.
- A mechanical guarantee is much better than a runbook for minimizing cognitive overhead.
-->

---

# Stop sharing types.

<div class="deck-grid-2 mt-10 writer-reader-principle subsumption-containment-grid">
  <div class="law-card success">
    <h3>Writers should be as strict as possible</h3>
    <p>Emit today's contract, not a mushy superset shaped by every historical rollout.</p>
  </div>
  <div class="law-card success">
    <h3>Readers should accept the union of the last few writers</h3>
    <p>Carry compatibility in the reader, where skew actually lands.</p>
  </div>
</div>

<div class="deck-callout mt-10">
  <p class="deck-quote">Stop sharing types between client and server.</p>
</div>

<!--
- The practical version of strict contracts starts with changing how we think about shared types.
- Stop sharing one type between client and server, or between serializer and deserializer.
- Replace shared types with a schema DSL that can generate separate writer and reader types.
- Writers should be as strict as possible and emit today's contract only.
- Readers should carry the compatibility burden by accepting the union of the last few writer versions.
- Shared types feel simpler, so separate reader and writer types only work if the tooling makes them the easy path.
-->

---

# A strict writer, a union reader

<div class="deck-grid-2 mt-8">
  <div class="one-figure-slide pydantic-compat-example">

```python
class UserProfileWriter(BaseModel):
    name: str = Field(min_length=1)
    age: int = Field(ge=0)
```

  </div>
  <div class="one-figure-slide pydantic-compat-example">

```python
type UserProfileReader =
    | UserProfileV1Reader
    | UserProfileV2Reader
    | UserProfileV3Reader

match payload:
    case UserProfileV3Reader(name=name, age=age):
        ...
    case UserProfileV2Reader(full_name=full_name):
        ...
```

  </div>
</div>

<div class="deck-callout mt-8">
  <p class="deck-quote">New writes stay clean. Compatibility is quarantined to explicit old-version branches.</p>
</div>

<!--
- The writer type should represent today's truth only.
- If all new users have a name and an age, the writer should enforce exactly that.
- The reader is where we pay the compatibility cost, as an explicit union of old writer shapes that still need to be accepted.
- Code branches on the versioned shape instead of hiding compatibility guesses in one giant optional type.
- This quarantines historical behavior so new writes stay clean and old branches are obvious, local, and eventually deletable.
- No one wants to maintain that reader type by hand, so it needs to be generated.
-->

---

# Stamp every payload with a writer version.

<div class="deck-grid-2 stamp-process-intro mt-6">
  <div class="law-card success">
    <h3>Writers stamp the shape they emitted</h3>
  </div>
  <div class="law-card success">
    <h3>Readers branch on the stamp, not on custom logic</h3>
  </div>
</div>

<div class="tooling-checklist tooling-checklist-compact stamp-process-checklist mt-6">
  <div v-click class="tooling-step"><strong>1</strong><span>Update the schema.</span></div>
  <div v-click class="tooling-step"><strong>2</strong><span>Detect breaking changes.</span></div>
  <div v-click class="tooling-step"><strong>3</strong><span>Keep the writer as strict as possible.</span></div>
  <div v-click class="tooling-step"><strong>4</strong><span>Make readers a tagged union of the last few writers.</span></div>
  <div v-click class="tooling-step"><strong>5</strong><span>Measure how often old writer branches still deserialize.</span></div>
  <div v-click class="tooling-step"><strong>6</strong><span>Delete old branches once those metrics hit zero.</span></div>
</div>

<!--
- The source of truth should be a contract in a schema DSL: JSON Schema, proto plus Protovalidate, or something equivalent.
- On every schema change, use static analysis where possible and fuzzing otherwise to check whether the change is breaking under partial rollout, and in which direction.
- Compatibility is a property of the contract, not of whatever data happens to be flowing today.
- If a change is breaking in either direction, CI should require a new stamped type.
- Writers use only the newest strict type, while readers use a tagged union of the historical writer types they still need to accept.
- Generated code should ideally make it impossible to serialize from reader types or deserialize from writer types.
- This moves cross-version reasoning out of engineers' heads and into CI.
- Schema versions become explicit points in time, which makes old branches measurable and eventually deletable.
- The full workflow requires breaking-change detection, telemetry, code generation, historical schema lookup in CI, and an escape hatch for brand-new evolving schemas.
-->

---
class: demo-full-bleed
---

<div class="simulator-slide-shell">
  <SimulatorDeck
    mode="transition"
    start-state-id="s6"
    :sequence="['s7', 's8', 's9']"
    :step-delay-ms="1600"
    :autoplay="false"
    :pause-at-end="true"
    :emit-rate-per-sec="1.1"
    :packet-speed-px-per-sec="78"
    :initial-packet-count="3"
    :initial-packet-spacing-px="220"
    :minimum-packet-gap-px="220"
    height="72vh"
    :layout-scale="0.5"
    :bare="true"
    :show-state-chip="false"
  />
</div>

<!--
- For non-breaking changes, CI should allow the change with no extra workflow.
- For a breaking change like changing a field type, CI should force a staged rollout.
- In this example, `interests` changes from a list of strings to an integer.
- CI first rejects updating the writer because the reader on main does not yet accept the new version.
- After stamping, readers accept both the old and new versions.
- Once metrics show all readers are rolled out, CI allows the new strict writer type to merge.
- Only after the old data tail is gone do we remove the old reader branch.
- Writer/reader terminology is clearer than forward/backward compatibility because many systems both deserialize input and serialize output.
-->

---
layout: center
---

<div class="emphasis-slide">
  <div class="emphasis-phrase">Tooling!</div>
  <div class="hero-talk-subtitle mt-4">Prove when possible. Fuzz when not.</div>
</div>

<!--
- The hard part is detecting breaking changes in an expressive contract language.
- Protobuf compatibility rules are relatively simple.
- JSON Schema compatibility is much harder because the language is much more expressive.
- That expressiveness is useful, but it makes compatibility analysis substantially more difficult.
-->

---
class: demo-full-bleed
---

<CheckerEmbed />

<!--
- `jsoncompat` is a generic JSON Schema subsumption checker.
- It statically analyzes schema changes to detect whether they are breaking.
-->

---

# A subsumption checker asks set containment

<div class="deck-grid-2 mt-4 writer-reader-principle">
  <div class="law-card success">
    <h3>New writer safe for old reader</h3>
    <p>L(new) ⊆ L(old)</p>
  </div>
  <div class="law-card success">
    <h3>Old writer safe for new reader</h3>
    <p>L(old) ⊆ L(new)</p>
  </div>
</div>

<div class="deck-callout mt-2">
  <p class="deck-quote">A schema change is compatible in a direction exactly when every value accepted before is still accepted after, or vice versa.</p>
</div>

<div class="assumption-footnote mt-3">
  Serializer assumption: no extra emitted fields beyond the declared schema.
</div>

<!--
- A schema denotes a language of valid JSON values: `L(schema)`.
- A subsumption checker asks whether one schema's language is a subset of another's.
- If `L(new) ⊆ L(old)`, a new writer is safe for an old reader.
- If `L(old) ⊆ L(new)`, an old writer is safe for a new reader.
- When either relation fails, the checker should produce a witness value if possible.
- This is hard because the language usually cannot be enumerated directly.
- `jsoncompat` needs keyword-specific logic to prove containment across recursive schemas, sum types, regexes, and other constraints.
- The checker can only reason about invariants encoded in the schema itself.
-->

---

# Two passes: prove, then search

<div class="deck-grid-2 mt-10 writer-reader-principle subsumption-containment-grid">
  <div class="law-card success">
    <h3>Static checker</h3>
    <p>Fast, deterministic proofs for the common cases.</p>
  </div>
  <div class="law-card success">
    <h3>Fuzzer</h3>
    <p>Concrete counterexamples when the schema is too expressive for a complete proof.</p>
  </div>
</div>

<div class="tooling-checklist tooling-checklist-compact mt-8">
  <div class="tooling-step"><strong>1</strong><span>Try to prove set containment from the schemas alone.</span></div>
  <div class="tooling-step"><strong>2</strong><span>If the proof is incomplete, search for a witness value.</span></div>
  <div class="tooling-step"><strong>3</strong><span>Use the witness to make the breakage obvious to humans and agents.</span></div>
</div>

<!--
- The workflow has two passes: prove first, then search.
- For many ordinary schema changes, compatibility can be proven directly from the schemas.
- Static checks are fast, deterministic, and easy to run in CI.
- When a complete proof is not practical, fuzzing can search for a concrete witness value accepted on one side and rejected on the other.
- Proofs are ideal when available; examples are the fallback when the schema is too expressive.
- Witnesses are useful for CI, code review, and incident debugging.
-->

---

# A concrete witness makes breakage obvious

<div class="witness-slide-shell mt-5">
  <div class="witness-schema-panel witness-schema-old">
    <div class="witness-label">Old schema</div>

```json {all|10}
"if": { "properties": { "mode": { "const": "percent" } } },
"then": {
  "properties": {
    "value": { "maximum": 100 }
  }
}
```

  </div>
  <div class="witness-change-rail">
    <div class="witness-arrow">→</div>
    <div class="witness-change-copy">one keyword tightens</div>
  </div>
  <div class="witness-schema-panel witness-schema-new">
    <div class="witness-label">New schema</div>

```json {all|4}
"if": { "properties": { "mode": { "const": "percent" } } },
"then": {
  "properties": {
    "value": { "exclusiveMaximum": 100 }
  }
}
```

  </div>
</div>

<div class="witness-result mt-6">
  <div class="witness-result-kicker">Witness</div>
  <code>{"mode":"percent","value":100}</code>
  <div class="witness-result-copy">Valid before. Rejected after. </div>
</div>

<!--
- A concrete witness makes compatibility failures much easier to understand.
- This example tightens a conditional schema from `maximum: 100` to `exclusiveMaximum: 100`.
- The witness is `{"mode":"percent","value":100}`.
- That payload was valid before and rejected after.
- In a large schema, one concrete payload is often more useful than abstract compatibility prose.
-->

---
class: demo-full-bleed
---

<CheckerEmbed />

<!--
- The checker on `jsoncompat.com` is written in Rust and compiled to WebAssembly for browser use.
- A simple object schema with `name` and `age` can be checked against itself to confirm compatibility.
- Tightening `minLength` from the old schema to the new schema is a breaking change for readers, because old data can still contain shorter names.
- The static checker works for most schemas, so fuzzing is usually not needed.
- Some JSON Schema features, such as `not`, conditionals, cross-field constraints, and complex composition, are much harder to decide statically.
- In those cases, the workflow falls back to fuzzing for concrete counterexamples.
- The fuzzer is also useful as a test harness for improving the static checker itself.
- `jsoncompat` is MIT-licensed and has already been useful for catching breaking changes at storage boundaries.
-->

---

# Make compatibility checks live next to the type

<div class="one-figure-slide pydantic-compat-example mt-8">

```python
from pydantic import BaseModel, Field

@jsoncompat_check(direction="both", stable_id="user-profile")
class UserProfile(BaseModel):
    name: str = Field(min_length=1)
    age: int = Field(ge=0)
```

</div>

<div class="deck-callout mt-8">
  <p class="deck-quote">The stable ID ties this model to its historical schema snapshots, and CI checks both rollout directions on every change.</p>
</div>

<!--
- You do not need to adopt schema-first code generation all at once to get value from compatibility tooling.
- At OpenAI, many Pydantic models define data stored in databases, Redis, and other durable systems.
- Those models can be decorated with a `jsoncompat` check that snapshots the current schema and compares it against previous commits in CI.
- This is less powerful than separate generated reader and writer types, but it still catches subtle breaking changes.
- The compatibility policy lives next to the type definition, and the stable ID preserves identity across renames and refactors.
- With `direction="both"`, changes must be safe for old readers seeing new writes and for new readers seeing old writes.
- The auth cache type from the incident is now protected by this checker.
- Tests can enforce that important storage-boundary models always have compatibility checks.
-->

---

# Adopt it in phases

<div class="tooling-checklist">
  <div class="tooling-step"><strong>1</strong><span>Start by annotating storage-boundary types and checking both rollout directions in CI.</span></div>
  <div class="tooling-step"><strong>2</strong><span>Add writer-version stamps and measure which old branches are still being read.</span></div>
  <div class="tooling-step"><strong>3</strong><span>Split strict writer types from union reader types on the boundaries that matter most.</span></div>
</div>

<div class="deck-callout mt-4">
  <p class="deck-quote">You do not need the whole end-state on day one to start catching real breakages.</p>
</div>

<!--
- Adoption can happen in phases.
- Start by annotating types on real storage or queue boundaries and checking both rollout directions in CI.
- Next, stamp writer versions and measure the old read tail so rollback risk and cleanup timing are visible.
- Then split strict writer types from union reader types on the boundaries that matter most.
- You do not need the full end-state on day one to start moving risky boundaries from vibes to mechanical checks.
-->

---

# When not to do this

<div class="deck-grid-2 mt-8">
  <div class="law-card">
    <h3>Probably not worth it</h3>
    <p>Ephemeral internal RPCs with no durable state, no queues, and no meaningful rollback tail.</p>
  </div>
  <div class="law-card success">
    <h3>Absolutely worth it</h3>
    <p>Caches, queues, databases, durable workflows, mobile or external clients, and any boundary where state outlives binary.</p>
  </div>
</div>

<div class="deck-callout mt-8">
  <p class="deck-quote">Use the heavy machinery where old code and new state can meet. That is where version skew turns into incidents.</p>
</div>

<!--
- Not every boundary needs this much machinery.
- Ephemeral internal RPCs with no durable state, queues, or rollback tail may be fine with ordinary API compatibility discipline.
- This is absolutely worth it for caches, queues, databases, durable workflows, mobile clients, external clients, and any boundary where state outlives a binary.
- The filter is simple: use the machinery where old code and new state can meet.
- That is where version skew turns into incidents.
-->

---

# Constrain. Split. Gate. Observe.

<div class="deck-grid-2 mt-8 sre-playbook-grid">
  <div class="law-card good">
    <h3>Constrain</h3>
    <p>Make strict schemas a cultural default: hidden assumptions should become contract rules, not tribal knowledge.</p>
  </div>
  <div class="law-card good">
    <h3>Split</h3>
    <p>Generate reader and writer types in your language of choice from the schema, and make historical unions cheap to maintain.</p>
  </div>
  <div class="law-card good">
    <h3>Gate</h3>
    <p>Run CI against the schema itself and against previous versions, detect breakages mechanically, and fail unsafe changes before merge.</p>
  </div>
  <div class="law-card good">
    <h3>Observe</h3>
    <p>Measure deserializations by payload version so you can see old tails, rollback risk, and when a branch is really gone.</p>
  </div>
</div>

<!--
- Constrain: make strict schemas the default so hidden assumptions become contract rules instead of tribal knowledge.
- Split: generate separate reader and writer types from the schema, and make historical reader unions cheap to maintain.
- Gate: run CI against the schema and its previous versions to detect breaking changes before merge.
- Observe: measure which old versions are still being read so rollback risk and cleanup timing are visible.
- The goal is not a one-off preflight for scary changes, but a system that makes the safe path the normal path.
-->

---
layout: center
---

<div class="thanks-slide">
  <div class="thanks-title">Questions?</div>
  <a class="thanks-link" href="https://jsoncompat.com">slides and tooling at jsoncompat.com</a>
</div>

<!--
- Slides and tooling are available at `jsoncompat.com`.
- Questions are welcome.
-->