# V1.0 design: `mkmsbr` — a clean-room boot-record library
## One-paragraph summary
A standalone MIT-licensed Rust library that produces Microsoft-compatible MBR
and FAT32/NTFS boot record byte sequences, replacing usbwin's runtime
dependency on ms-sys. The library is developed **eval-first**: a verification
harness using ms-sys as a comparison oracle and QEMU as a boot tester is
written before any boot code. The library is "done" for a given variant when
its eval passes — predictable shipping, no more debugging-the-PBR-blind
sessions.
## Why this exists
The v0.2 / v0.3 path shells out to ms-sys for the MBR and PBR bytes. That
works but has three real costs:
1. **External GPL-2 dependency** — usbwin is MIT, ms-sys is GPL-2. The
process-boundary separation keeps the licenses compatible, but a
self-contained MIT binary would be cleaner for redistribution.
2. **Installation friction** — users must `git clone gitlab.com/cmaiolino/ms-sys`
then build it. Hampers "one-command install."
3. **No control** — bug in ms-sys (e.g. the sub-sector-write-on-rdisk
issue from FIELD_FINDINGS §2) requires upstream patching or working
around in our shell-out layer. With our own implementation, we fix it.
Why we didn't do this in v0.2: trying to write a clean-room PBR without an
eval framework first led to a 6-hour debugging spiral against unreliable
test infrastructure (see chat history May 18). The lesson — test framework
**before** boot code — is now baked into this spec.
## Verifiability hierarchy
Four nested layers, from tightest signal to loosest. Each layer is an
independent test that any candidate boot record can be run against. A
production-quality boot record passes all four.
### Layer 1 — Byte-equality vs ms-sys (tightest)
For each variant (e.g. `--fat32pe`, `--mbr7`), `mkmsbr` produces N bytes
and we assert them byte-equal to ms-sys's output for the same input.
```rust
#[test]
fn fat32pe_matches_mssys() {
let our_bytes = mkmsbr::fat32_pbr_bootmgr(/*bpb*/...);
let mssys_bytes = oracle::ms_sys_fat32pe(/*bpb*/...);
assert_eq!(our_bytes, mssys_bytes);
}
```
**Confidence:** if the bytes match, we know we're producing Microsoft-
equivalent code (because ms-sys IS Microsoft's code, extracted).
**Limitation:** byte equality is *sufficient* but not *necessary*. We
might validly produce different bytes that boot equivalently (different
register-allocation strategy in NASM, different jump patterns, etc.).
Treat this layer as the **strongest** signal but not the **only** one.
### Layer 2 — QEMU boot smoke test (per variant)
For each variant, build a synthetic disk image with `mkmsbr`'s output
applied, boot it under `qemu-system-i386`, verify a specific success
signal on serial.
The synthetic test environments:
- **FAT32 + NTLDR stub**: tiny partition with a fake `NTLDR` that prints
`MKMSBR OK\r\n` to COM1 and halts. Validates `fat32_pbr_ntldr`.
- **FAT32 + bootmgr stub**: same, fake `BOOTMGR`. Validates
`fat32_pbr_bootmgr` (the multi-sector one).
- **MBR + active partition + dummy PBR that just prints to serial**:
validates `mbr_7`, `mbr_xp`.
Test infrastructure already half-built (`crates/usbwin-boot/tests/qemu_pbr.rs`
from v0.2 work). Generalize it into a per-variant harness.
### Layer 3 — Real-content boot test
Same as Layer 2 but the disk image contains a real-but-stripped Windows
install tree (e.g. a 50 MB subset of Win 7 with the bootmgr loader chain).
Boots far enough to reach the Windows boot menu / installer welcome.
Slower than Layer 2 (~10s per run vs ~3s) but catches issues where
`bootmgr` itself doesn't like our PBR (the multi-sector handoff edge
cases that the spec doesn't fully document).
### Layer 4 — Real-hardware smoke
Pre-release checklist run manually on a small set of target machines:
- Dell E6410 (Phoenix BIOS, USB-as-HDD-with-quirks)
- Generic 2010-2015 Intel desktop (modern AMI BIOS)
- A 2005-vintage P4 box (legacy Phoenix BIOS, may treat USB as floppy)
For each: write a USB, boot, confirm reaches "Setup is loading drivers..."
within 30 seconds. Document the results in `HARDWARE_TESTS.md`.
## Library scope (v1.0 target)
Public API:
```rust
// Master Boot Records (whole-disk, sector 0).
pub fn mbr_win7(disk: DiskGeometry, partitions: &[PartitionEntry]) -> [u8; 512];
pub fn mbr_xp(disk: DiskGeometry, partitions: &[PartitionEntry]) -> [u8; 512];
// FAT32 Partition Boot Records (partition-local, 1 sector for XP, ~16 for Win7+).
pub fn fat32_pbr_ntldr(bpb: Fat32Bpb) -> [u8; 512];
pub fn fat32_pbr_bootmgr(bpb: Fat32Bpb) -> PbrBytes; // multi-sector
// NTFS Partition Boot Records.
pub fn ntfs_pbr_bootmgr(bpb: NtfsBpb) -> PbrBytes;
// Optional: byte-level splice helpers (preserve existing BPB on a freshly
// formatted device while overwriting boot code).
pub fn splice_fat32_pbr(existing: [u8; 512], boot_code: &PbrBytes) -> [u8; 8192];
```
Out of scope for v1.0:
- exFAT boot records
- Other-OS boot records (syslinux, GRUB stage 1)
- UEFI boot variants
- Partitioning utilities (that's usbwin's job)
## Component breakdown — order of implementation
Sequenced by complexity. Each component is implementable AND testable
against Layers 1-2 before moving to the next.
| 1 | `mbr_xp` | 512 | Low | Layer 1 + Layer 2 |
| 2 | `mbr_win7` | 512 | Low | Layer 1 + Layer 2 |
| 3 | `fat32_pbr_ntldr` | 512 | Medium | Layer 1 + Layer 2 + Layer 3 |
| 4 | `fat32_pbr_bootmgr` | ~6 KB (sectors 0, 1, 12) | High | Layer 2 + Layer 3 + Layer 4 |
| 5 | `ntfs_pbr_bootmgr` | ~16 KB | High | Layer 2 + Layer 3 |
Items 1-2 are quick wins; they get us off ms-sys for MBR work within a
week. Item 3 (XP PBR) is the next-most-tractable. Item 4 is the hard one
(multi-sector, multi-stage, the thing we hit a wall on previously); it
gets the most eval scrutiny. Item 5 only matters if usbwin grows NTFS
support, which is not currently planned.
## Project layout
A new sibling Rust crate, either inside the usbwin workspace or
standalone:
```
mkmsbr/
├── Cargo.toml # MIT-licensed
├── README.md
├── src/
│ ├── lib.rs # Public API (above)
│ ├── mbr.rs # MBR variant assembly + byte layout
│ ├── pbr/
│ │ ├── fat32_ntldr.rs # PBR variant
│ │ ├── fat32_bootmgr.rs # Multi-sector PBR
│ │ └── ntfs_bootmgr.rs
│ └── geometry.rs # DiskGeometry, PartitionEntry, Fat32Bpb, NtfsBpb
├── boot-asm/ # NASM source files
│ ├── mbr_xp.asm
│ ├── mbr_win7.asm
│ ├── fat32_pbr_ntldr.asm
│ ├── fat32_pbr_bootmgr/ # Multi-file because multi-sector
│ │ ├── sector0.asm
│ │ ├── sector1.asm
│ │ └── sector12.asm
│ └── ntfs_pbr_bootmgr/
├── tests/
│ ├── oracle/ # Layer 1: ms-sys comparison
│ │ ├── mod.rs # Wraps ms-sys, parses output
│ │ ├── mbr_oracle.rs
│ │ └── pbr_oracle.rs
│ ├── qemu/ # Layer 2: synthetic boot tests
│ │ ├── mod.rs # Harness for spinning up qemu-system-i386
│ │ ├── fake_ntldr.asm # 30-byte stub that prints to serial
│ │ ├── fake_bootmgr.asm # similar
│ │ ├── mbr_smoke.rs
│ │ └── pbr_smoke.rs
│ ├── real_content/ # Layer 3: tests against real Windows files
│ │ └── ... # See "Real-content fixtures" below
│ └── fixtures/ # Small data files (BPBs, partition tables)
└── docs/
└── PROVENANCE.md # Clean-room protocol (inherited from usbwin)
```
## Eval-first workflow (how to actually develop a variant)
This is the methodology that fixes the "blind debugging" failure mode.
### Step 0 — Before writing any boot code: wire up the eval
For variant N (e.g. `fat32_pbr_ntldr`):
1. Build the oracle: a function `expected_bytes(input) -> Vec<u8>` that
runs ms-sys on a synthetic disk and extracts the resulting boot record.
2. Build the smoke test: a function `boots_ok(bytes) -> bool` that splices
`bytes` into a synthetic image and runs it under QEMU, returning
whether the success marker appeared on serial.
3. Write a stub function `our_bytes(input) -> Vec<u8>` that returns
`vec![0; 512]` (or whatever the wrong answer is).
4. Confirm: `expected_bytes(...) != our_bytes(...)` (Layer 1 fails) and
`!boots_ok(our_bytes(...))` (Layer 2 fails).
**The evals fail at this point. That's the point.** You can't accidentally
think you've shipped something when the eval still fails.
### Step 1 — Implement until Layer 2 passes
Write NASM code. Build the bytes. Re-run the eval. Iterate.
Layer 2 (QEMU boot smoke) is the **primary** signal during development.
It's a binary pass/fail and the tightest correctness check that doesn't
require ms-sys.
### Step 2 — Add Layer 1 (byte-equality vs ms-sys)
Once Layer 2 passes, compare to ms-sys's bytes. Three outcomes:
- **Byte-identical**: ship.
- **Byte-different but boot-equivalent**: ship; document why.
- **Byte-different and one doesn't boot**: figure out which one is right.
### Step 3 — Layer 3 + Layer 4 before release
Real-content fixtures and hardware smoke before the variant is declared
"production." See per-variant target in the table above.
## Real-content fixtures (Layer 3)
We need test inputs that look like actual Microsoft install media but are
small enough to commit. Per-variant:
- **NTLDR variant**: A ~5 MB synthetic FAT32 with real-shaped `NTLDR` file
(just enough to load and print a marker). Built from the actual Win XP
files extracted from the user's ISO into `tests/fixtures/xp_minimal/`.
- **BOOTMGR variant**: ~10 MB synthetic FAT32 with real Win 7 `bootmgr`
plus `Boot/BCD`. Generated from a Win 7 ISO at test-fixture-build time.
Fixtures are reproducible (a `tests/fixtures/build.sh` script generates
them from an ISO path the developer supplies via env var). The repo
doesn't check in the fixtures themselves (license, size); it checks in
the build script and the SHA-256 of the expected outputs so we know when
the fixture changed.
## Clean-room protocol (air-gapped from ms-sys source)
This is the strictest form of the protocol described in usbwin's
`docs/PROVENANCE.md` — what intellectual-property law calls a "Chinese
Wall" or "clean-room" reimplementation, the same shape Compaq used in
1984 to reimplement the IBM PC BIOS without copyright infringement.
### The air gap
Two distinct roles exist in `mkmsbr` development:
| **Spec readers** | FAT32 spec, BIOS docs, ms-sys's *output bytes* (as a black box) | mkmsbr source code (NASM, Rust) |
| **Oracle plumbing** | Whatever they need; usually nothing | The test harness that invokes ms-sys as subprocess and compares its output to mkmsbr's |
The same person can do both *as long as the Spec-reader role never touches
ms-sys's source code*. ms-sys's `.c` and `.h` files (especially `inc/*.h`
which contain the actual boot-record byte arrays as C arrays) are
**forbidden reading** for anyone writing mkmsbr.
If a contributor has read ms-sys source code, they're tainted for the
duration of their useful memory of it (months, conservatively). They can
work on the oracle/test harness but not on the boot-code source files.
### Allowed references for the Spec-reader role
- Microsoft FAT32 spec (FATGEN103.doc) — publicly published
- Microsoft NTFS spec (public docs)
- IBM/Phoenix BIOS Interface Reference (INT 13h, INT 10h, etc.)
- USB Mass Storage Class spec
- USB / OHCI / UHCI / EHCI controller specifications
- OSDev wiki *prose and pseudocode only* (never code blocks)
- Microsoft's own SDK headers describing on-disk structures (e.g.
`winioctl.h`'s partition table layout)
- IDA-decompiled views of *generic* bootloaders if no Microsoft binary
is involved (still risky; ask first)
### Disallowed for the Spec-reader role
- **ms-sys source files** — `src/*.c`, `inc/*.h`, anything in the ms-sys
repository besides the compiled binary
- syslinux, GRUB, GRUB4DOS, Linux kernel boot code, BSD bootloaders
- Microsoft Windows source leaks (even if not Microsoft-attributed)
- Any reverse-engineered disassembly of `bootmgr`, `ntldr`, `bootsect.exe`,
or `sys.exe`
- Stack Overflow / forum posts that contain code blocks from the disallowed
sources (prose-only reading is fine)
### How ms-sys appears in the codebase
**Only as a black-box subprocess in `tests/oracle/`.** The harness shape:
```rust
// tests/oracle/mod.rs
fn ms_sys_output(args: &[&str], target_device: &str) -> Result<Vec<u8>> {
// Invoke `/usr/local/bin/ms-sys <args> <target_device>`
// Then read back the bytes from <target_device>
// Return the resulting boot record bytes
}
```
No `#include "ms-sys/inc/some_header.h"`, no `let bytes =
include_bytes!("../../ms-sys/inc/winnt5_fat32_bootcode.h")`, no
`Command::new("cat").arg("ms-sys/src/file_system.c")`. The library has
*no awareness* of how ms-sys produces its bytes; it only knows what they
are.
### Why this matters
ms-sys's boot-record byte arrays in `inc/*.h` are themselves derived from
Microsoft binaries. Their legal status was always murky — the ms-sys
maintainers shipped them under GPL-2 because that's what FSF's "make
everything free" philosophy says to do with code that's already in the
wild, not because they had a license from Microsoft to redistribute. A
clean-room reimplementation that never sees ms-sys's bytes (only their
output, which is observed behavior, not protected expression) sidesteps
this entire question.
mkmsbr's bytes will be derived solely from the FAT32 / NTFS / BIOS specs.
If they happen to be byte-identical to Microsoft's bytes (because the
space of "correct implementations of this small task" is small), that's
parallel invention, not copying.
## Verifiable evidence
Two independent verifiability claims; each needs concrete mechanisms.
Policy without machinery is just a wish.
### Verifiably correct: machine-checked, CI-enforced
For each variant, the eval layers produce binary pass/fail signals that
CI runs on every commit:
```yaml
# .github/workflows/verify.yml (sketch)
correctness:
- layer1_oracle: # Byte-equality vs ms-sys
gates: [all-variants-equal] | [variant-equal-or-justified]
run-on: every PR
- layer2_qemu: # Synthetic boot smoke
gates: [all-stubs-print-marker]
run-on: every PR
- layer3_real: # Real-content boot test
gates: [reaches-installer-welcome]
run-on: PR + nightly
- layer4_hardware: # Manual sign-off
gates: [signed-off-by HARDWARE_TESTS.md commit]
run-on: release-gate
```
Concrete correctness deliverables per variant:
1. **Coverage matrix** (`COVERAGE.md`) — variant × eval-layer × pass/fail.
No variant ships if any required layer is RED.
2. **Determinism check** — `cargo build` produces byte-identical output
across runs / clean checkouts / different developer machines.
Verified via `tests/determinism.sh` that runs `cargo clean && cargo
build --release && sha256sum target/release/...`. CI fails on diff.
3. **Reproducible from spec** — `docs/SPEC_TRACE.md` maps each non-trivial
constant in our boot code to the spec page that justifies it
(FAT32 BPB offset 0x0B == BytsPerSec → FATGEN103 §3.1). Catch
"magic number copied from somewhere" early.
4. **Regression fixtures** — for each variant, a fixed input → fixed
output mapping in `tests/golden/`. Changes require explicit fixture
updates with justification in the commit message.
### Verifiably green-room: process + machinery
Air-gap is a policy that humans can break (or be subtly tainted by).
The mechanisms that catch the breakage:
#### 1. Contributor reading declaration (per-PR)
Every PR that touches `mkmsbr/src/` or `mkmsbr/boot-asm/` includes
a YAML block in the description:
```yaml
clean_room:
role: spec-reader # or "oracle-plumbing"
references_consulted:
- "FATGEN103.doc §3.1-3.3 (BPB layout)"
- "Phoenix BIOS Interface Reference §INT 13h, fn 0x42"
- "osdev.org/FAT (prose only; verified no code blocks read)"
forbidden_unread:
- ms-sys/src/ # I have not read these files
- ms-sys/inc/ # ever
- syslinux/ # not in last 12 months
- any-Windows-source-leak
attestation: |
I am not aware of having read ms-sys source code, leaked Microsoft
boot record source, or any GPL/BSD bootloader source within the
last 12 months. The code in this PR was derived solely from the
references listed above.
signed: $CONTRIBUTOR_NAME
date: 2026-XX-XX
```
This goes in the PR description (not the commit, so it's separable
from the code). It's a sworn-style attestation, not legally binding,
but it creates a paper trail.
#### 2. Reading log (`CONTRIBUTORS_READING.md`)
A repository-tracked file listing, per contributor, what reference
sources they've read AT ALL, with timestamps. Append-only. A
contributor's eligibility for the spec-reader role is determined by
checking this log:
```markdown
## joa@example.com
| FATGEN103.doc (FAT32 spec) | 2026-05 | ✓ allowed |
| Phoenix BIOS Interface Ref | 2026-05 | ✓ allowed |
| osdev.org/FAT prose | 2026-06 | ✓ allowed |
| ms-sys/inc/*.h byte arrays | 2018 | ❌ tainted |
```
If "tainted" sources appear, the contributor cannot work on
`mkmsbr/src/` (boot code) — only on `mkmsbr/tests/oracle/` (where
seeing ms-sys output is the whole point). The tainting half-life is
conservatively 24 months from last-read; after that, on case-by-case
basis with project-lead sign-off.
#### 3. Forbidden-symbol grep (CI gate)
A simple CI check that fails the build if any of these patterns appear
anywhere in `mkmsbr/src/` or `mkmsbr/boot-asm/`:
```sh
# .github/workflows/clean_room_check.sh
FORBIDDEN_PATTERNS=(
"ms-sys" # literal name (shouldn't appear in source)
"mssys"
"ilko-y" # the WaitBT author
"syslinux"
"ldlinux"
"include.*ms-sys"
"/* extracted from" # common "I copied this" comment style
)
for pattern in "${FORBIDDEN_PATTERNS[@]}"; do
if grep -r "$pattern" mkmsbr/src/ mkmsbr/boot-asm/; then
echo "FORBIDDEN PATTERN FOUND: $pattern"
exit 1
fi
done
```
Trivial check, but catches the dumbest "I copied a block of bytes from
that one .h file" leaks.
#### 4. Statistical similarity check (CI gate)
For each variant where ms-sys produces equivalent bytes, compute the
"non-trivial similarity" between our bytes and ms-sys's:
- Both files produce 512 bytes.
- ~250 of those are the BPB area (which both must produce identically — it's
filesystem state, not boot code).
- The remaining ~260 bytes are the boot code itself.
Trivial similarity (same opcodes for the same operations) is fine. Excessive
similarity beyond what FAT32 setup-code structure mandates is a flag.
Concrete measurement: for the 260-byte boot-code region, compute
Hamming distance between our output and ms-sys's. Plot the distribution
across all variants. Flag any variant where the Hamming distance is
suspiciously low (e.g. fewer than 10 differing bytes when the function
is non-trivial — that suggests copy or extremely-likely accidental
parallel invention, both worth a manual review).
This is a soft signal, not a hard gate. It triggers an "investigate
this" rather than "fail the build."
#### 5. Independent code review (per release)
Before each `mkmsbr` release tag, the boot-code files (`mkmsbr/src/`,
`mkmsbr/boot-asm/`) are reviewed by a contributor who has *not*
written any of the code, with explicit focus on: "does this look
clean-room, or does anything look copy-pasted from somewhere
familiar?" The reviewer also confirms the contributor reading log
matches the PR claims.
For a single-contributor project, this step is "contributor reviews
their own code with the explicit checklist, in writing." Not as
strong as two-person review but catches the "I just did this without
thinking" cases.
#### 6. Public legal review (before 1.0)
Before declaring a 1.0 release, a one-time review by a lawyer
familiar with clean-room reverse engineering (or at minimum, a
public RFC review with knowledgeable hobbyist community input from
e.g. msfn.org). Document the review outcomes in
`docs/LEGAL_REVIEW.md`.
### Both verifiability properties combined
A pull request is mergeable to `main` only when ALL of these are
green:
| Layer 1 oracle test | Correctness | ✓ CI |
| Layer 2 QEMU smoke | Correctness | ✓ CI |
| Layer 3 real-content (if req) | Correctness | ✓ CI |
| Determinism check | Correctness | ✓ CI |
| Coverage matrix updated | Correctness | ✓ CI |
| Clean-room declaration in PR | Cleanroom | semi (lint) |
| Reading log updated | Cleanroom | semi (lint) |
| Forbidden-symbol grep clean | Cleanroom | ✓ CI |
| Statistical similarity below threshold | Cleanroom | ✓ CI |
| Independent review sign-off | Both | manual |
Layer 4 (real hardware) is required at release-gate, not per-PR.
The combination is what makes the claim **verifiable**: if any
reviewer in the next 30 years wants to challenge whether mkmsbr is
genuinely clean-room, they can read the reading log, audit the PR
attestations, run the forbidden-symbol checks themselves, and inspect
the similarity-distribution data. Nothing depends on trusting the
authors' word.
## Form factor: library AND binary
`mkmsbr` ships as a single Cargo crate that produces both a Rust library
and a CLI binary. The same code, two consumption modes.
### Library (`mkmsbr` Rust crate)
The canonical API. usbwin links against it directly, gets Rust-typed
input (`Fat32Bpb`, `DiskGeometry`, etc.) and Rust-typed output
(`[u8; 512]`, `PbrBytes`). No subprocess overhead, no string parsing,
no shell escaping. usbwin's `pipeline/windows.rs` switches from
`Command::new(ms_sys).args(...)` to `mkmsbr::fat32_pbr_bootmgr(bpb)`.
```rust
// In usbwin's Cargo.toml:
mkmsbr = { path = "../mkmsbr" } // or version = "1.0" when published
// In usbwin's pipeline/windows.rs:
let pbr_bytes = mkmsbr::fat32_pbr_bootmgr(bpb);
dev.write_at(0, &pbr_bytes[0])?; // sector 0
dev.write_at(512, &pbr_bytes[1])?; // sector 1
dev.write_at(12 * 512, &pbr_bytes[12])?; // sector 12
```
### Binary (`mkmsbr` CLI)
A thin wrapper that exposes the library as a command-line tool — a
drop-in replacement for ms-sys for the variants we support. ~50 lines
of clap-based argument parsing around library calls.
```sh
# usbwin's --mbr7 equivalent
mkmsbr --mbr-win7 /dev/rdisk6
# usbwin's --fat32pe equivalent
mkmsbr --fat32-bootmgr /dev/rdisk6s1
# Or by variant explicitly
mkmsbr --variant fat32-bootmgr --output /dev/rdisk6s1
```
The CLI uses the SAME library functions internally. The binary form
exists because:
1. **Drop-in for existing recipes** — anyone using ms-sys in a shell
script can switch by changing `ms-sys --fat32pe` to `mkmsbr
--fat32-bootmgr`. Lowers adoption friction for the broader
USB-tool ecosystem (WinSetupFromUSB-likes, retro-computing folks).
2. **Cross-language interop** — Python/Go/Bash consumers don't need a
Rust toolchain.
3. **Reproducibility verification** — for the audit case ("show me
mkmsbr produces the same bytes ms-sys does"), an auditor can run
`mkmsbr` and `ms-sys` side by side without setting up a Rust dev
environment.
4. **Oracle-of-our-own-binary tests** — the test harness can use the
CLI binary as the black-box subprocess, exactly like it uses
ms-sys. This catches regressions in the public API surface where
internal-library unit tests might pass but the public CLI
contract has drifted.
### Crate layout
Same `mkmsbr/` workspace member from the earlier layout section, with
`Cargo.toml` declaring both targets:
```toml
[package]
name = "mkmsbr"
version = "1.0.0"
edition = "2021"
license = "MIT"
[lib]
name = "mkmsbr"
path = "src/lib.rs"
[[bin]]
name = "mkmsbr"
path = "src/bin/mkmsbr.rs"
[dependencies]
clap = { version = "4", features = ["derive"] } # binary only; cargo
# will skip when used
# as a library dep
```
The library is the public interface that's stable across versions; the
binary's CLI is allowed to evolve more freely (deprecations announced
in release notes). usbwin always tracks the library version, not the
binary version.
### Naming for symmetry with ms-sys
Where it's obvious, the CLI flag names mirror ms-sys's so the muscle
memory transfers:
| `--mbr7` | `--mbr-win7` | `mbr_win7(...)` |
| `--mbr` | `--mbr-xp` | `mbr_xp(...)` |
| `--fat32pe` | `--fat32-bootmgr` | `fat32_pbr_bootmgr(...)` |
| `--fat32nt` | `--fat32-ntldr` | `fat32_pbr_ntldr(...)` |
| `--ntfs` | `--ntfs-bootmgr` | `ntfs_pbr_bootmgr(...)` |
mkmsbr's flags are slightly more verbose (-pe vs -bootmgr) because the
ms-sys names are domain-jargon-y; new users shouldn't have to know that
"PE" means "Preinstall Environment" to write a Win 7 boot record. The
old names are accepted as aliases for muscle memory.
## Audience and packaging
`mkmsbr` is its **own project**, not a usbwin subcomponent. Separate
repo, separate releases, separate brew formula, independent
distribution.
### Why separate (audience analysis)
usbwin's audience is small and narrow: macOS-on-Apple-Silicon users
who want Windows install USBs without Rosetta/VMs. The community is
likely dozens to low-hundreds of people total, growing slowly as more
sysadmins, IT folk, and retro-tech enthusiasts hit the post-Rosetta
gap.
`mkmsbr`'s audience is much wider:
| Linux/BSD users replacing ms-sys (cleaner replacement) | ~thousands | The largest single bucket. ms-sys is in most Linux distros' repos but GPL-2 and ancient; mkmsbr is MIT, Rust, maintained. |
| Cross-platform USB-tool maintainers | ~dozens of projects, transitively reaching thousands | Ventoy, WoeUSB, multibootusb, easy2boot, custom in-house. They want mkmsbr as a dep. |
| Retro-computing hobbyists | ~thousands | MSFN, boot-land, Vogons forum people. Hand-rolling install media. |
| Forensic / data-recovery folks | ~hundreds | Niche but high-skill; rebuilding damaged boot records is a real use case. |
| Embedded / firmware engineers | ~thousands | Anyone shipping x86 boot media for kiosks, SBCs, signage, industrial. |
| CI / automation | ~hundreds | Linux CI runners producing Windows install media as artifacts. |
| Educators / OS-dev curriculum | ~hundreds | mkmsbr is one of few public reference implementations derived purely from specs; teaching material. |
usbwin **uses** mkmsbr. usbwin is a downstream consumer like any
other. The dependency direction is `usbwin → mkmsbr`, never the
reverse.
### What this implies operationally
1. **Separate repository** when the work starts in earnest. `mkmsbr`
on GitHub as its own project; usbwin pins a Cargo dependency by
git URL during dev and by published version after `mkmsbr` hits
crates.io.
2. **Separate Homebrew formulas**, two install commands:
```sh
brew install mkmsbr brew install usbwin ```
Each formula maintained in its own tap. mkmsbr's would be a
better candidate for eventual homebrew-core inclusion than usbwin's,
precisely because the audience is wider and the tool is smaller and
more general-purpose.
3. **Separate clean-room audits.** mkmsbr's reading log, PR
attestations, similarity-distribution data are all tracked in
mkmsbr's repo. usbwin's repo doesn't carry that burden — it links
against mkmsbr and inherits the audit conclusion as a third party
would. The clean-room story is structurally cleaner this way.
4. **Independent release cadence.** mkmsbr might hit 1.0 long before
usbwin's clean-room-mkmsbr story is integrated. usbwin meanwhile
continues shipping with ms-sys shell-out (v0.2/v0.3 path) and
migrates to mkmsbr when it makes sense — possibly as v2.0.
5. **Different governance later.** If mkmsbr gains maintainers from
the wider audience (Linux distros, USB-tool authors), it can have
its own governance model. usbwin stays a smaller-team project. The
API contract between them is a normal library API contract — same
way Rust apps depend on `serde` without `serde` becoming part of
each consuming project.
### Brew packaging sketch
`mkmsbr`'s formula (the one the wider audience wants):
```ruby
class Mkmsbr < Formula
desc "Clean-room implementation of Microsoft boot records (MBR/PBR)"
homepage "https://github.com/jmappleby/mkmsbr"
url "https://github.com/jmappleby/mkmsbr/archive/v1.0.0.tar.gz"
sha256 "..."
license "MIT"
depends_on "rust" => :build
depends_on "nasm" => :build
def install
system "cargo", "install", *std_cargo_args
end
test do
# Smoke: produce an XP MBR, check the boot signature is right
assert_predicate (testpath/"mbr.bin"), :exist?
end
end
```
`usbwin`'s formula (downstream):
```ruby
class Usbwin < Formula
desc "Native arm64 macOS bootable-USB writer"
homepage "https://github.com/jmappleby/usbwin"
url "https://github.com/jmappleby/usbwin/archive/v2.0.0.tar.gz"
sha256 "..."
license "MIT"
depends_on "mkmsbr" # transitive: brings in mkmsbr
depends_on "rust" => :build
depends_on "nasm" => :build
def install
system "cargo", "install", *std_cargo_args
end
end
```
Two brew commands, two install paths, clean dependency graph.
### Migration plan for usbwin
This is what happens to usbwin when mkmsbr is ready:
- **Today (v0.2 / v0.3):** usbwin shells out to ms-sys. Working.
- **v1.0 (usbwin):** Same. ms-sys-based, real-hardware-verified.
- **v2.0 (usbwin) — coincides with mkmsbr 1.0:** usbwin replaces
the ms-sys shell-out with `mkmsbr::*` library calls. The two
shell-out helpers (`ms_sys_mbr7`, `ms_sys_fat32pe`, etc.) become
thin wrappers around `mkmsbr::mbr_win7()` and `mkmsbr::fat32_pbr_bootmgr()`.
ms-sys becomes optional (kept around as a fallback / oracle).
- **v3.0 (usbwin):** ms-sys removed entirely. Single MIT binary.
Each step is independently shippable; the migration is incremental.
## License
`mkmsbr` is MIT-2.0. Independent of usbwin (could be used by other
tools — e.g. a Linux LiveUSB creator, a forensic image preparation tool,
a retro-computing utility). Single self-contained crate.
ms-sys's GPL-2 license doesn't transit into `mkmsbr` because:
- We don't link, include, or distribute ms-sys
- Test-time subprocess invocation is mere aggregation (per FSF)
- Output bytes are not copyrightable (data, not creative expression — and
even if it were, our implementation derives them from the FAT32 spec,
not from observing ms-sys output)
## Timeline estimate
Honest, not optimistic.
| Eval framework (Layers 1, 2) | 1 | 1 |
| `mbr_xp` + `mbr_win7` | 1 | 2 |
| `fat32_pbr_ntldr` to L2 | 2 | 4 |
| `fat32_pbr_ntldr` to L3 | 1 | 5 |
| `fat32_pbr_bootmgr` to L2 | 4 | 9 |
| `fat32_pbr_bootmgr` to L3 | 2 | 11 |
| Real-hardware verification | 2 | 13 |
| Documentation + 1.0 release | 1 | 14 |
About 3-4 months part-time. Predictable because of the eval-first
methodology — at each milestone we either pass the layer's test or
we don't, no ambiguity.
## What kicks off v1.0 work
This spec lands today. v1.0 work doesn't start until:
1. v0.2 (Win 7 mode via ms-sys) is **real-hardware verified** on the
Dell E6410.
2. v0.3 (XP mode via ms-sys) is **real-hardware verified** on the same.
3. There's a concrete reason to invest 3 months — public release plan,
external interest, etc.
Until then, ms-sys is the right answer.