mkmsbr 1.0.0

Clean-room library + CLI for Microsoft-compatible MBR and FAT32/NTFS boot records.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
# V1.0 design: `mkmsbr` — a clean-room boot-record library

## One-paragraph summary

A standalone MIT-licensed Rust library that produces Microsoft-compatible MBR
and FAT32/NTFS boot record byte sequences, replacing usbwin's runtime
dependency on ms-sys. The library is developed **eval-first**: a verification
harness using ms-sys as a comparison oracle and QEMU as a boot tester is
written before any boot code. The library is "done" for a given variant when
its eval passes — predictable shipping, no more debugging-the-PBR-blind
sessions.

## Why this exists

The v0.2 / v0.3 path shells out to ms-sys for the MBR and PBR bytes. That
works but has three real costs:

1. **External GPL-2 dependency** — usbwin is MIT, ms-sys is GPL-2. The
   process-boundary separation keeps the licenses compatible, but a
   self-contained MIT binary would be cleaner for redistribution.
2. **Installation friction** — users must `git clone gitlab.com/cmaiolino/ms-sys`
   then build it. Hampers "one-command install."
3. **No control** — bug in ms-sys (e.g. the sub-sector-write-on-rdisk
   issue from FIELD_FINDINGS §2) requires upstream patching or working
   around in our shell-out layer. With our own implementation, we fix it.

Why we didn't do this in v0.2: trying to write a clean-room PBR without an
eval framework first led to a 6-hour debugging spiral against unreliable
test infrastructure (see chat history May 18). The lesson — test framework
**before** boot code — is now baked into this spec.

## Verifiability hierarchy

Four nested layers, from tightest signal to loosest. Each layer is an
independent test that any candidate boot record can be run against. A
production-quality boot record passes all four.

### Layer 1 — Byte-equality vs ms-sys (tightest)

For each variant (e.g. `--fat32pe`, `--mbr7`), `mkmsbr` produces N bytes
and we assert them byte-equal to ms-sys's output for the same input.

```rust
#[test]
fn fat32pe_matches_mssys() {
    let our_bytes  = mkmsbr::fat32_pbr_bootmgr(/*bpb*/...);
    let mssys_bytes = oracle::ms_sys_fat32pe(/*bpb*/...);
    assert_eq!(our_bytes, mssys_bytes);
}
```

**Confidence:** if the bytes match, we know we're producing Microsoft-
equivalent code (because ms-sys IS Microsoft's code, extracted).

**Limitation:** byte equality is *sufficient* but not *necessary*. We
might validly produce different bytes that boot equivalently (different
register-allocation strategy in NASM, different jump patterns, etc.).
Treat this layer as the **strongest** signal but not the **only** one.

### Layer 2 — QEMU boot smoke test (per variant)

For each variant, build a synthetic disk image with `mkmsbr`'s output
applied, boot it under `qemu-system-i386`, verify a specific success
signal on serial.

The synthetic test environments:

- **FAT32 + NTLDR stub**: tiny partition with a fake `NTLDR` that prints
  `MKMSBR OK\r\n` to COM1 and halts. Validates `fat32_pbr_ntldr`.
- **FAT32 + bootmgr stub**: same, fake `BOOTMGR`. Validates
  `fat32_pbr_bootmgr` (the multi-sector one).
- **MBR + active partition + dummy PBR that just prints to serial**:
  validates `mbr_7`, `mbr_xp`.

Test infrastructure already half-built (`crates/usbwin-boot/tests/qemu_pbr.rs`
from v0.2 work). Generalize it into a per-variant harness.

### Layer 3 — Real-content boot test

Same as Layer 2 but the disk image contains a real-but-stripped Windows
install tree (e.g. a 50 MB subset of Win 7 with the bootmgr loader chain).
Boots far enough to reach the Windows boot menu / installer welcome.

Slower than Layer 2 (~10s per run vs ~3s) but catches issues where
`bootmgr` itself doesn't like our PBR (the multi-sector handoff edge
cases that the spec doesn't fully document).

### Layer 4 — Real-hardware smoke

Pre-release checklist run manually on a small set of target machines:

- Dell E6410 (Phoenix BIOS, USB-as-HDD-with-quirks)
- Generic 2010-2015 Intel desktop (modern AMI BIOS)
- A 2005-vintage P4 box (legacy Phoenix BIOS, may treat USB as floppy)

For each: write a USB, boot, confirm reaches "Setup is loading drivers..."
within 30 seconds. Document the results in `HARDWARE_TESTS.md`.

## Library scope (v1.0 target)

Public API:

```rust
// Master Boot Records (whole-disk, sector 0).
pub fn mbr_win7(disk: DiskGeometry, partitions: &[PartitionEntry]) -> [u8; 512];
pub fn mbr_xp(disk: DiskGeometry, partitions: &[PartitionEntry]) -> [u8; 512];

// FAT32 Partition Boot Records (partition-local, 1 sector for XP, ~16 for Win7+).
pub fn fat32_pbr_ntldr(bpb: Fat32Bpb) -> [u8; 512];
pub fn fat32_pbr_bootmgr(bpb: Fat32Bpb) -> PbrBytes;       // multi-sector

// NTFS Partition Boot Records.
pub fn ntfs_pbr_bootmgr(bpb: NtfsBpb) -> PbrBytes;

// Optional: byte-level splice helpers (preserve existing BPB on a freshly
// formatted device while overwriting boot code).
pub fn splice_fat32_pbr(existing: [u8; 512], boot_code: &PbrBytes) -> [u8; 8192];
```

Out of scope for v1.0:
- exFAT boot records
- Other-OS boot records (syslinux, GRUB stage 1)
- UEFI boot variants
- Partitioning utilities (that's usbwin's job)

## Component breakdown — order of implementation

Sequenced by complexity. Each component is implementable AND testable
against Layers 1-2 before moving to the next.

| # | Component             | Bytes  | Complexity | Eval status target |
|---|------------------------|--------|------------|---------------------|
| 1 | `mbr_xp`               | 512    | Low        | Layer 1 + Layer 2 |
| 2 | `mbr_win7`             | 512    | Low        | Layer 1 + Layer 2 |
| 3 | `fat32_pbr_ntldr`      | 512    | Medium     | Layer 1 + Layer 2 + Layer 3 |
| 4 | `fat32_pbr_bootmgr`    | ~6 KB (sectors 0, 1, 12) | High | Layer 2 + Layer 3 + Layer 4 |
| 5 | `ntfs_pbr_bootmgr`     | ~16 KB | High       | Layer 2 + Layer 3 |

Items 1-2 are quick wins; they get us off ms-sys for MBR work within a
week. Item 3 (XP PBR) is the next-most-tractable. Item 4 is the hard one
(multi-sector, multi-stage, the thing we hit a wall on previously); it
gets the most eval scrutiny. Item 5 only matters if usbwin grows NTFS
support, which is not currently planned.

## Project layout

A new sibling Rust crate, either inside the usbwin workspace or
standalone:

```
mkmsbr/
├── Cargo.toml                # MIT-licensed
├── README.md
├── src/
│   ├── lib.rs                # Public API (above)
│   ├── mbr.rs                # MBR variant assembly + byte layout
│   ├── pbr/
│   │   ├── fat32_ntldr.rs    # PBR variant
│   │   ├── fat32_bootmgr.rs  # Multi-sector PBR
│   │   └── ntfs_bootmgr.rs
│   └── geometry.rs           # DiskGeometry, PartitionEntry, Fat32Bpb, NtfsBpb
├── boot-asm/                 # NASM source files
│   ├── mbr_xp.asm
│   ├── mbr_win7.asm
│   ├── fat32_pbr_ntldr.asm
│   ├── fat32_pbr_bootmgr/    # Multi-file because multi-sector
│   │   ├── sector0.asm
│   │   ├── sector1.asm
│   │   └── sector12.asm
│   └── ntfs_pbr_bootmgr/
├── tests/
│   ├── oracle/               # Layer 1: ms-sys comparison
│   │   ├── mod.rs            # Wraps ms-sys, parses output
│   │   ├── mbr_oracle.rs
│   │   └── pbr_oracle.rs
│   ├── qemu/                 # Layer 2: synthetic boot tests
│   │   ├── mod.rs            # Harness for spinning up qemu-system-i386
│   │   ├── fake_ntldr.asm    # 30-byte stub that prints to serial
│   │   ├── fake_bootmgr.asm  # similar
│   │   ├── mbr_smoke.rs
│   │   └── pbr_smoke.rs
│   ├── real_content/         # Layer 3: tests against real Windows files
│   │   └── ...               # See "Real-content fixtures" below
│   └── fixtures/             # Small data files (BPBs, partition tables)
└── docs/
    └── PROVENANCE.md         # Clean-room protocol (inherited from usbwin)
```

## Eval-first workflow (how to actually develop a variant)

This is the methodology that fixes the "blind debugging" failure mode.

### Step 0 — Before writing any boot code: wire up the eval

For variant N (e.g. `fat32_pbr_ntldr`):

1. Build the oracle: a function `expected_bytes(input) -> Vec<u8>` that
   runs ms-sys on a synthetic disk and extracts the resulting boot record.
2. Build the smoke test: a function `boots_ok(bytes) -> bool` that splices
   `bytes` into a synthetic image and runs it under QEMU, returning
   whether the success marker appeared on serial.
3. Write a stub function `our_bytes(input) -> Vec<u8>` that returns
   `vec![0; 512]` (or whatever the wrong answer is).
4. Confirm: `expected_bytes(...) != our_bytes(...)` (Layer 1 fails) and
   `!boots_ok(our_bytes(...))` (Layer 2 fails).

**The evals fail at this point. That's the point.** You can't accidentally
think you've shipped something when the eval still fails.

### Step 1 — Implement until Layer 2 passes

Write NASM code. Build the bytes. Re-run the eval. Iterate.

Layer 2 (QEMU boot smoke) is the **primary** signal during development.
It's a binary pass/fail and the tightest correctness check that doesn't
require ms-sys.

### Step 2 — Add Layer 1 (byte-equality vs ms-sys)

Once Layer 2 passes, compare to ms-sys's bytes. Three outcomes:

- **Byte-identical**: ship.
- **Byte-different but boot-equivalent**: ship; document why.
- **Byte-different and one doesn't boot**: figure out which one is right.

### Step 3 — Layer 3 + Layer 4 before release

Real-content fixtures and hardware smoke before the variant is declared
"production." See per-variant target in the table above.

## Real-content fixtures (Layer 3)

We need test inputs that look like actual Microsoft install media but are
small enough to commit. Per-variant:

- **NTLDR variant**: A ~5 MB synthetic FAT32 with real-shaped `NTLDR` file
  (just enough to load and print a marker). Built from the actual Win XP
  files extracted from the user's ISO into `tests/fixtures/xp_minimal/`.
- **BOOTMGR variant**: ~10 MB synthetic FAT32 with real Win 7 `bootmgr`
  plus `Boot/BCD`. Generated from a Win 7 ISO at test-fixture-build time.

Fixtures are reproducible (a `tests/fixtures/build.sh` script generates
them from an ISO path the developer supplies via env var). The repo
doesn't check in the fixtures themselves (license, size); it checks in
the build script and the SHA-256 of the expected outputs so we know when
the fixture changed.

## Clean-room protocol (air-gapped from ms-sys source)

This is the strictest form of the protocol described in usbwin's
`docs/PROVENANCE.md` — what intellectual-property law calls a "Chinese
Wall" or "clean-room" reimplementation, the same shape Compaq used in
1984 to reimplement the IBM PC BIOS without copyright infringement.

### The air gap

Two distinct roles exist in `mkmsbr` development:

| Role            | What they see                                           | What they produce               |
|-----------------|---------------------------------------------------------|----------------------------------|
| **Spec readers** | FAT32 spec, BIOS docs, ms-sys's *output bytes* (as a black box) | mkmsbr source code (NASM, Rust) |
| **Oracle plumbing** | Whatever they need; usually nothing | The test harness that invokes ms-sys as subprocess and compares its output to mkmsbr's |

The same person can do both *as long as the Spec-reader role never touches
ms-sys's source code*. ms-sys's `.c` and `.h` files (especially `inc/*.h`
which contain the actual boot-record byte arrays as C arrays) are
**forbidden reading** for anyone writing mkmsbr.

If a contributor has read ms-sys source code, they're tainted for the
duration of their useful memory of it (months, conservatively). They can
work on the oracle/test harness but not on the boot-code source files.

### Allowed references for the Spec-reader role

- Microsoft FAT32 spec (FATGEN103.doc) — publicly published
- Microsoft NTFS spec (public docs)
- IBM/Phoenix BIOS Interface Reference (INT 13h, INT 10h, etc.)
- USB Mass Storage Class spec
- USB / OHCI / UHCI / EHCI controller specifications
- OSDev wiki *prose and pseudocode only* (never code blocks)
- Microsoft's own SDK headers describing on-disk structures (e.g.
  `winioctl.h`'s partition table layout)
- IDA-decompiled views of *generic* bootloaders if no Microsoft binary
  is involved (still risky; ask first)

### Disallowed for the Spec-reader role

- **ms-sys source files**`src/*.c`, `inc/*.h`, anything in the ms-sys
  repository besides the compiled binary
- syslinux, GRUB, GRUB4DOS, Linux kernel boot code, BSD bootloaders
- Microsoft Windows source leaks (even if not Microsoft-attributed)
- Any reverse-engineered disassembly of `bootmgr`, `ntldr`, `bootsect.exe`,
  or `sys.exe`
- Stack Overflow / forum posts that contain code blocks from the disallowed
  sources (prose-only reading is fine)

### How ms-sys appears in the codebase

**Only as a black-box subprocess in `tests/oracle/`.** The harness shape:

```rust
// tests/oracle/mod.rs
fn ms_sys_output(args: &[&str], target_device: &str) -> Result<Vec<u8>> {
    // Invoke `/usr/local/bin/ms-sys <args> <target_device>`
    // Then read back the bytes from <target_device>
    // Return the resulting boot record bytes
}
```

No `#include "ms-sys/inc/some_header.h"`, no `let bytes =
include_bytes!("../../ms-sys/inc/winnt5_fat32_bootcode.h")`, no
`Command::new("cat").arg("ms-sys/src/file_system.c")`. The library has
*no awareness* of how ms-sys produces its bytes; it only knows what they
are.

### Why this matters

ms-sys's boot-record byte arrays in `inc/*.h` are themselves derived from
Microsoft binaries. Their legal status was always murky — the ms-sys
maintainers shipped them under GPL-2 because that's what FSF's "make
everything free" philosophy says to do with code that's already in the
wild, not because they had a license from Microsoft to redistribute. A
clean-room reimplementation that never sees ms-sys's bytes (only their
output, which is observed behavior, not protected expression) sidesteps
this entire question.

mkmsbr's bytes will be derived solely from the FAT32 / NTFS / BIOS specs.
If they happen to be byte-identical to Microsoft's bytes (because the
space of "correct implementations of this small task" is small), that's
parallel invention, not copying.

## Verifiable evidence

Two independent verifiability claims; each needs concrete mechanisms.
Policy without machinery is just a wish.

### Verifiably correct: machine-checked, CI-enforced

For each variant, the eval layers produce binary pass/fail signals that
CI runs on every commit:

```yaml
# .github/workflows/verify.yml (sketch)
correctness:
  - layer1_oracle:      # Byte-equality vs ms-sys
      gates: [all-variants-equal] | [variant-equal-or-justified]
      run-on: every PR
  - layer2_qemu:        # Synthetic boot smoke
      gates: [all-stubs-print-marker]
      run-on: every PR
  - layer3_real:        # Real-content boot test
      gates: [reaches-installer-welcome]
      run-on: PR + nightly
  - layer4_hardware:    # Manual sign-off
      gates: [signed-off-by HARDWARE_TESTS.md commit]
      run-on: release-gate
```

Concrete correctness deliverables per variant:

1. **Coverage matrix** (`COVERAGE.md`) — variant × eval-layer × pass/fail.
   No variant ships if any required layer is RED.
2. **Determinism check** — `cargo build` produces byte-identical output
   across runs / clean checkouts / different developer machines.
   Verified via `tests/determinism.sh` that runs `cargo clean && cargo
   build --release && sha256sum target/release/...`. CI fails on diff.
3. **Reproducible from spec** — `docs/SPEC_TRACE.md` maps each non-trivial
   constant in our boot code to the spec page that justifies it
   (FAT32 BPB offset 0x0B == BytsPerSec → FATGEN103 §3.1). Catch
   "magic number copied from somewhere" early.
4. **Regression fixtures** — for each variant, a fixed input → fixed
   output mapping in `tests/golden/`. Changes require explicit fixture
   updates with justification in the commit message.

### Verifiably green-room: process + machinery

Air-gap is a policy that humans can break (or be subtly tainted by).
The mechanisms that catch the breakage:

#### 1. Contributor reading declaration (per-PR)

Every PR that touches `mkmsbr/src/` or `mkmsbr/boot-asm/` includes
a YAML block in the description:

```yaml
clean_room:
  role: spec-reader              # or "oracle-plumbing"
  references_consulted:
    - "FATGEN103.doc §3.1-3.3 (BPB layout)"
    - "Phoenix BIOS Interface Reference §INT 13h, fn 0x42"
    - "osdev.org/FAT (prose only; verified no code blocks read)"
  forbidden_unread:
    - ms-sys/src/         # I have not read these files
    - ms-sys/inc/         # ever
    - syslinux/           # not in last 12 months
    - any-Windows-source-leak
  attestation: |
    I am not aware of having read ms-sys source code, leaked Microsoft
    boot record source, or any GPL/BSD bootloader source within the
    last 12 months. The code in this PR was derived solely from the
    references listed above.
  signed: $CONTRIBUTOR_NAME
  date: 2026-XX-XX
```

This goes in the PR description (not the commit, so it's separable
from the code). It's a sworn-style attestation, not legally binding,
but it creates a paper trail.

#### 2. Reading log (`CONTRIBUTORS_READING.md`)

A repository-tracked file listing, per contributor, what reference
sources they've read AT ALL, with timestamps. Append-only. A
contributor's eligibility for the spec-reader role is determined by
checking this log:

```markdown
## joa@example.com

| Source                                | Read     | Status        |
|----------------------------------------|----------|----------------|
| FATGEN103.doc (FAT32 spec)             | 2026-05  | ✓ allowed     |
| Phoenix BIOS Interface Ref             | 2026-05  | ✓ allowed     |
| osdev.org/FAT prose                    | 2026-06  | ✓ allowed     |
| ms-sys/inc/*.h byte arrays             | 2018     | ❌ tainted    |
```

If "tainted" sources appear, the contributor cannot work on
`mkmsbr/src/` (boot code) — only on `mkmsbr/tests/oracle/` (where
seeing ms-sys output is the whole point). The tainting half-life is
conservatively 24 months from last-read; after that, on case-by-case
basis with project-lead sign-off.

#### 3. Forbidden-symbol grep (CI gate)

A simple CI check that fails the build if any of these patterns appear
anywhere in `mkmsbr/src/` or `mkmsbr/boot-asm/`:

```sh
# .github/workflows/clean_room_check.sh
FORBIDDEN_PATTERNS=(
    "ms-sys"             # literal name (shouldn't appear in source)
    "mssys"
    "ilko-y"             # the WaitBT author
    "syslinux"
    "ldlinux"
    "include.*ms-sys"
    "/* extracted from"  # common "I copied this" comment style
)
for pattern in "${FORBIDDEN_PATTERNS[@]}"; do
    if grep -r "$pattern" mkmsbr/src/ mkmsbr/boot-asm/; then
        echo "FORBIDDEN PATTERN FOUND: $pattern"
        exit 1
    fi
done
```

Trivial check, but catches the dumbest "I copied a block of bytes from
that one .h file" leaks.

#### 4. Statistical similarity check (CI gate)

For each variant where ms-sys produces equivalent bytes, compute the
"non-trivial similarity" between our bytes and ms-sys's:

- Both files produce 512 bytes.
- ~250 of those are the BPB area (which both must produce identically — it's
  filesystem state, not boot code).
- The remaining ~260 bytes are the boot code itself.

Trivial similarity (same opcodes for the same operations) is fine. Excessive
similarity beyond what FAT32 setup-code structure mandates is a flag.

Concrete measurement: for the 260-byte boot-code region, compute
Hamming distance between our output and ms-sys's. Plot the distribution
across all variants. Flag any variant where the Hamming distance is
suspiciously low (e.g. fewer than 10 differing bytes when the function
is non-trivial — that suggests copy or extremely-likely accidental
parallel invention, both worth a manual review).

This is a soft signal, not a hard gate. It triggers an "investigate
this" rather than "fail the build."

#### 5. Independent code review (per release)

Before each `mkmsbr` release tag, the boot-code files (`mkmsbr/src/`,
`mkmsbr/boot-asm/`) are reviewed by a contributor who has *not*
written any of the code, with explicit focus on: "does this look
clean-room, or does anything look copy-pasted from somewhere
familiar?" The reviewer also confirms the contributor reading log
matches the PR claims.

For a single-contributor project, this step is "contributor reviews
their own code with the explicit checklist, in writing." Not as
strong as two-person review but catches the "I just did this without
thinking" cases.

#### 6. Public legal review (before 1.0)

Before declaring a 1.0 release, a one-time review by a lawyer
familiar with clean-room reverse engineering (or at minimum, a
public RFC review with knowledgeable hobbyist community input from
e.g. msfn.org). Document the review outcomes in
`docs/LEGAL_REVIEW.md`.

### Both verifiability properties combined

A pull request is mergeable to `main` only when ALL of these are
green:

| Check                          | Verifies      | Automated? |
|--------------------------------|---------------|------------|
| Layer 1 oracle test            | Correctness   | ✓ CI       |
| Layer 2 QEMU smoke             | Correctness   | ✓ CI       |
| Layer 3 real-content (if req)  | Correctness   | ✓ CI       |
| Determinism check              | Correctness   | ✓ CI       |
| Coverage matrix updated        | Correctness   | ✓ CI       |
| Clean-room declaration in PR    | Cleanroom     | semi (lint) |
| Reading log updated             | Cleanroom     | semi (lint) |
| Forbidden-symbol grep clean    | Cleanroom     | ✓ CI       |
| Statistical similarity below threshold | Cleanroom | ✓ CI       |
| Independent review sign-off    | Both          | manual      |

Layer 4 (real hardware) is required at release-gate, not per-PR.

The combination is what makes the claim **verifiable**: if any
reviewer in the next 30 years wants to challenge whether mkmsbr is
genuinely clean-room, they can read the reading log, audit the PR
attestations, run the forbidden-symbol checks themselves, and inspect
the similarity-distribution data. Nothing depends on trusting the
authors' word.

## Form factor: library AND binary

`mkmsbr` ships as a single Cargo crate that produces both a Rust library
and a CLI binary. The same code, two consumption modes.

### Library (`mkmsbr` Rust crate)

The canonical API. usbwin links against it directly, gets Rust-typed
input (`Fat32Bpb`, `DiskGeometry`, etc.) and Rust-typed output
(`[u8; 512]`, `PbrBytes`). No subprocess overhead, no string parsing,
no shell escaping. usbwin's `pipeline/windows.rs` switches from
`Command::new(ms_sys).args(...)` to `mkmsbr::fat32_pbr_bootmgr(bpb)`.

```rust
// In usbwin's Cargo.toml:
mkmsbr = { path = "../mkmsbr" }   // or version = "1.0" when published

// In usbwin's pipeline/windows.rs:
let pbr_bytes = mkmsbr::fat32_pbr_bootmgr(bpb);
dev.write_at(0, &pbr_bytes[0])?;    // sector 0
dev.write_at(512, &pbr_bytes[1])?;  // sector 1
dev.write_at(12 * 512, &pbr_bytes[12])?;  // sector 12
```

### Binary (`mkmsbr` CLI)

A thin wrapper that exposes the library as a command-line tool — a
drop-in replacement for ms-sys for the variants we support. ~50 lines
of clap-based argument parsing around library calls.

```sh
# usbwin's --mbr7 equivalent
mkmsbr --mbr-win7 /dev/rdisk6

# usbwin's --fat32pe equivalent
mkmsbr --fat32-bootmgr /dev/rdisk6s1

# Or by variant explicitly
mkmsbr --variant fat32-bootmgr --output /dev/rdisk6s1
```

The CLI uses the SAME library functions internally. The binary form
exists because:

1. **Drop-in for existing recipes** — anyone using ms-sys in a shell
   script can switch by changing `ms-sys --fat32pe` to `mkmsbr
   --fat32-bootmgr`. Lowers adoption friction for the broader
   USB-tool ecosystem (WinSetupFromUSB-likes, retro-computing folks).
2. **Cross-language interop** — Python/Go/Bash consumers don't need a
   Rust toolchain.
3. **Reproducibility verification** — for the audit case ("show me
   mkmsbr produces the same bytes ms-sys does"), an auditor can run
   `mkmsbr` and `ms-sys` side by side without setting up a Rust dev
   environment.
4. **Oracle-of-our-own-binary tests** — the test harness can use the
   CLI binary as the black-box subprocess, exactly like it uses
   ms-sys. This catches regressions in the public API surface where
   internal-library unit tests might pass but the public CLI
   contract has drifted.

### Crate layout

Same `mkmsbr/` workspace member from the earlier layout section, with
`Cargo.toml` declaring both targets:

```toml
[package]
name = "mkmsbr"
version = "1.0.0"
edition = "2021"
license = "MIT"

[lib]
name = "mkmsbr"
path = "src/lib.rs"

[[bin]]
name = "mkmsbr"
path = "src/bin/mkmsbr.rs"

[dependencies]
clap = { version = "4", features = ["derive"] }   # binary only; cargo
                                                   # will skip when used
                                                   # as a library dep
```

The library is the public interface that's stable across versions; the
binary's CLI is allowed to evolve more freely (deprecations announced
in release notes). usbwin always tracks the library version, not the
binary version.

### Naming for symmetry with ms-sys

Where it's obvious, the CLI flag names mirror ms-sys's so the muscle
memory transfers:

| ms-sys flag      | mkmsbr flag          | Library function             |
|------------------|------------------------|------------------------------|
| `--mbr7`         | `--mbr-win7`           | `mbr_win7(...)`              |
| `--mbr`          | `--mbr-xp`             | `mbr_xp(...)`                |
| `--fat32pe`      | `--fat32-bootmgr`      | `fat32_pbr_bootmgr(...)`     |
| `--fat32nt`      | `--fat32-ntldr`        | `fat32_pbr_ntldr(...)`       |
| `--ntfs`         | `--ntfs-bootmgr`       | `ntfs_pbr_bootmgr(...)`      |

mkmsbr's flags are slightly more verbose (-pe vs -bootmgr) because the
ms-sys names are domain-jargon-y; new users shouldn't have to know that
"PE" means "Preinstall Environment" to write a Win 7 boot record. The
old names are accepted as aliases for muscle memory.

## Audience and packaging

`mkmsbr` is its **own project**, not a usbwin subcomponent. Separate
repo, separate releases, separate brew formula, independent
distribution.

### Why separate (audience analysis)

usbwin's audience is small and narrow: macOS-on-Apple-Silicon users
who want Windows install USBs without Rosetta/VMs. The community is
likely dozens to low-hundreds of people total, growing slowly as more
sysadmins, IT folk, and retro-tech enthusiasts hit the post-Rosetta
gap.

`mkmsbr`'s audience is much wider:

| Audience                                       | Approximate size | Notes |
|------------------------------------------------|------------------|-------|
| Linux/BSD users replacing ms-sys (cleaner replacement) | ~thousands       | The largest single bucket. ms-sys is in most Linux distros' repos but GPL-2 and ancient; mkmsbr is MIT, Rust, maintained. |
| Cross-platform USB-tool maintainers            | ~dozens of projects, transitively reaching thousands | Ventoy, WoeUSB, multibootusb, easy2boot, custom in-house. They want mkmsbr as a dep. |
| Retro-computing hobbyists                      | ~thousands       | MSFN, boot-land, Vogons forum people. Hand-rolling install media. |
| Forensic / data-recovery folks                 | ~hundreds        | Niche but high-skill; rebuilding damaged boot records is a real use case. |
| Embedded / firmware engineers                  | ~thousands       | Anyone shipping x86 boot media for kiosks, SBCs, signage, industrial. |
| CI / automation                                | ~hundreds        | Linux CI runners producing Windows install media as artifacts. |
| Educators / OS-dev curriculum                  | ~hundreds        | mkmsbr is one of few public reference implementations derived purely from specs; teaching material. |

usbwin **uses** mkmsbr. usbwin is a downstream consumer like any
other. The dependency direction is `usbwin → mkmsbr`, never the
reverse.

### What this implies operationally

1. **Separate repository** when the work starts in earnest. `mkmsbr`
   on GitHub as its own project; usbwin pins a Cargo dependency by
   git URL during dev and by published version after `mkmsbr` hits
   crates.io.

2. **Separate Homebrew formulas**, two install commands:

   ```sh
   brew install mkmsbr          # standalone — what the wider audience wants
   brew install usbwin           # depends on mkmsbr; transitively installs it
   ```

   Each formula maintained in its own tap. mkmsbr's would be a
   better candidate for eventual homebrew-core inclusion than usbwin's,
   precisely because the audience is wider and the tool is smaller and
   more general-purpose.

3. **Separate clean-room audits.** mkmsbr's reading log, PR
   attestations, similarity-distribution data are all tracked in
   mkmsbr's repo. usbwin's repo doesn't carry that burden — it links
   against mkmsbr and inherits the audit conclusion as a third party
   would. The clean-room story is structurally cleaner this way.

4. **Independent release cadence.** mkmsbr might hit 1.0 long before
   usbwin's clean-room-mkmsbr story is integrated. usbwin meanwhile
   continues shipping with ms-sys shell-out (v0.2/v0.3 path) and
   migrates to mkmsbr when it makes sense — possibly as v2.0.

5. **Different governance later.** If mkmsbr gains maintainers from
   the wider audience (Linux distros, USB-tool authors), it can have
   its own governance model. usbwin stays a smaller-team project. The
   API contract between them is a normal library API contract — same
   way Rust apps depend on `serde` without `serde` becoming part of
   each consuming project.

### Brew packaging sketch

`mkmsbr`'s formula (the one the wider audience wants):

```ruby
class Mkmsbr < Formula
  desc "Clean-room implementation of Microsoft boot records (MBR/PBR)"
  homepage "https://github.com/jmappleby/mkmsbr"
  url "https://github.com/jmappleby/mkmsbr/archive/v1.0.0.tar.gz"
  sha256 "..."
  license "MIT"

  depends_on "rust" => :build
  depends_on "nasm" => :build

  def install
    system "cargo", "install", *std_cargo_args
  end

  test do
    # Smoke: produce an XP MBR, check the boot signature is right
    assert_predicate (testpath/"mbr.bin"), :exist?
  end
end
```

`usbwin`'s formula (downstream):

```ruby
class Usbwin < Formula
  desc "Native arm64 macOS bootable-USB writer"
  homepage "https://github.com/jmappleby/usbwin"
  url "https://github.com/jmappleby/usbwin/archive/v2.0.0.tar.gz"
  sha256 "..."
  license "MIT"

  depends_on "mkmsbr"          # transitive: brings in mkmsbr
  depends_on "rust" => :build
  depends_on "nasm" => :build

  def install
    system "cargo", "install", *std_cargo_args
  end
end
```

Two brew commands, two install paths, clean dependency graph.

### Migration plan for usbwin

This is what happens to usbwin when mkmsbr is ready:

- **Today (v0.2 / v0.3):** usbwin shells out to ms-sys. Working.
- **v1.0 (usbwin):** Same. ms-sys-based, real-hardware-verified.
- **v2.0 (usbwin) — coincides with mkmsbr 1.0:** usbwin replaces
  the ms-sys shell-out with `mkmsbr::*` library calls. The two
  shell-out helpers (`ms_sys_mbr7`, `ms_sys_fat32pe`, etc.) become
  thin wrappers around `mkmsbr::mbr_win7()` and `mkmsbr::fat32_pbr_bootmgr()`.
  ms-sys becomes optional (kept around as a fallback / oracle).
- **v3.0 (usbwin):** ms-sys removed entirely. Single MIT binary.

Each step is independently shippable; the migration is incremental.

## License

`mkmsbr` is MIT-2.0. Independent of usbwin (could be used by other
tools — e.g. a Linux LiveUSB creator, a forensic image preparation tool,
a retro-computing utility). Single self-contained crate.

ms-sys's GPL-2 license doesn't transit into `mkmsbr` because:
- We don't link, include, or distribute ms-sys
- Test-time subprocess invocation is mere aggregation (per FSF)
- Output bytes are not copyrightable (data, not creative expression — and
  even if it were, our implementation derives them from the FAT32 spec,
  not from observing ms-sys output)

## Timeline estimate

Honest, not optimistic.

| Phase                        | Weeks | Cumulative |
|------------------------------|-------|-------------|
| Eval framework (Layers 1, 2)  | 1     | 1           |
| `mbr_xp` + `mbr_win7`        | 1     | 2           |
| `fat32_pbr_ntldr` to L2       | 2     | 4           |
| `fat32_pbr_ntldr` to L3       | 1     | 5           |
| `fat32_pbr_bootmgr` to L2     | 4     | 9           |
| `fat32_pbr_bootmgr` to L3     | 2     | 11          |
| Real-hardware verification    | 2     | 13          |
| Documentation + 1.0 release   | 1     | 14          |

About 3-4 months part-time. Predictable because of the eval-first
methodology — at each milestone we either pass the layer's test or
we don't, no ambiguity.

## What kicks off v1.0 work

This spec lands today. v1.0 work doesn't start until:

1. v0.2 (Win 7 mode via ms-sys) is **real-hardware verified** on the
   Dell E6410.
2. v0.3 (XP mode via ms-sys) is **real-hardware verified** on the same.
3. There's a concrete reason to invest 3 months — public release plan,
   external interest, etc.

Until then, ms-sys is the right answer.