# Vision
## What vyre-conform is
vyre-conform is a **proof system** for GPU compute correctness.
It is not a test suite. The difference is fundamental:
A test suite says "I ran 1000 random inputs through XOR on GPU and CPU and they
all matched." This provides probabilistic confidence. The confidence grows with
the number of tests but never reaches certainty. A bug that only manifests at
input `(0xDEADBEEF, 0xCAFEBABE)` escapes forever if that pair is never
generated.
A proof system says "XOR satisfies these algebraic laws: commutative
(`a^b = b^a`), associative (`(a^b)^c = a^(b^c)`), identity (`a^0 = a`),
self-inverse (`a^a = 0`). I verified all four laws exhaustively on the u8
domain and on random witnesses from the u32 domain, and I also checked point
parity against the CPU reference. Therefore XOR is correct under the checked
domains."
The first statement is only point sampling. The second combines structural law
checking with reference parity, which is the certification model used here.
Rust's borrow checker does not test memory safety. It proves it via type rules.
If the checker says "safe," it is safe — not probably safe, not likely safe,
mathematically safe. vyre-conform must achieve the same property for GPU compute
correctness: if the suite says "conforming," the backend is correct — not
probably correct, mathematically correct (within the verification domain).
## What vyre-conform must become
### The NVIDIA test
If NVIDIA wanted to add a CUDA backend to vyre tomorrow, the process must be:
1. An engineer reads the spec. Every operation, every edge case, every
type conversion is defined. Zero ambiguity.
2. The engineer implements `VyreBackend` for CUDA. One trait, a few methods.
3. The engineer runs `ConformanceSuite::run()`. The suite executes millions of
test cases across all registered operations.
4. The suite produces a formal certificate: "CUDA backend v1.0: Level L2
conformance. 32 primitive ops algebraically verified. 8 compound ops
verified compositionally. 4 engines passing all invariants."
5. Done. No communication with us needed.
If AMD wants to do the same independently, they can. Both backends produce
identical bytes for every input — not because they coordinated, but because
the spec is unambiguous and the conformance suite is the arbiter.
No human judgment. No approval process. No relationship required. Quality is
the gate.
### The Linux property
Linux's power comes from one structural property: the cost of not being on
Linux is higher than the cost of contributing to it. NVIDIA spends tens of
millions per year on Linux kernel engineers because if their GPUs don't work on
Linux, they can't sell to data centers. Linus Torvalds can publicly insult
NVIDIA because NVIDIA needs Linux more than Linux needs NVIDIA.
vyre-conform is what makes this possible for vyre. Without conformance:
backends are proprietary, incompatible, untestable. With conformance: backends
are interchangeable, verifiable, and any company can participate on equal terms.
The conformance suite is the enforcer. It turns "vyre is a spec" from a claim
into a fact. Without it, the spec is a document. With it, the spec is a
contract.
### Scale
The suite must handle 100M+ test cases. Not as an aspirational target — as a
structural requirement. At 32 primitive ops with exhaustive u8 testing
(65,536 binary pairs each), that's 2M tests for primitives alone. Add compound
ops, composition chains, engine programs, regression replay, and nondeterminism
detection: 100M is the floor, not the ceiling.
This means:
- **Streaming execution.** Generate one input, dispatch, compare, discard, next.
Never hold all inputs in memory.
- **Parallel dispatch.** Batch GPU dispatches for throughput. Generate K inputs,
dispatch all K, compare all K.
- **Sharded execution.** Partition by test ID. Shard 0 runs tests 0..N/K.
Shard 1 runs N/K..2N/K. Deterministic assignment. CI runs 16 shards in
parallel. No coordination between shards.
- **Deterministic resumption.** "Resume from test 47,329" always starts at the
same test. Seed-based generation guarantees this.
- **Regression persistence.** Every failing input saved to disk. Replayed every
run. The regression suite only grows.
## The two authorities
### CPU reference functions
The CPU reference function for each operation IS the specification for point
queries. "What does `xor(3, 5)` return?" Answer: ask the CPU reference. It
returns `6`. Every conforming backend must return `6`. No other authority.
The CPU reference is simple, obvious, and correct by inspection. It uses
standard Rust integer arithmetic. It fits on one screen. A reader can verify
it in seconds.
### Algebraic laws
The algebraic laws are the specification for structural identity. "What kind
of function is XOR?" Answer: it is a binary operation over u32 that is
commutative, associative, has identity 0, and is self-inverse. Those laws are
valuable, but they do not uniquely identify bitwise XOR without the CPU
reference.
Laws enable verification at a higher level than point testing. Instead of
testing billions of individual inputs, you verify a handful of structural
properties that constrain the function. If the properties are sufficient to
uniquely characterize the function, and the implementation satisfies all of
them, the implementation is correct.
### Why both
You need both because neither is sufficient alone:
- **Without the CPU reference:** Laws might not uniquely characterize the
function. `{commutative, associative, identity(0)}` is satisfied by both
ADD and XOR. Laws alone cannot distinguish them.
- **Without laws:** Point testing misses bugs on untested inputs. An
implementation could match the CPU on all generated inputs but diverge on
an input the generators never produce.
Together: the CPU reference anchors correctness for any individual input. The
laws constrain the function's structure across all inputs. A conforming
implementation matches the CPU on tested inputs AND satisfies all laws.
### When laws don't uniquely characterize
For some operations, the declared laws do not uniquely characterize the
function. For example, `clamp(a, lo, hi)` has complex behavior that cannot be
fully captured by simple algebraic laws.
In these cases: the CPU reference is the tiebreaker. The laws provide partial
structural verification (they catch some classes of bugs). The CPU reference
provides complete point verification (it catches all bugs on tested inputs).
The combination is strictly stronger than either alone.
The goal is to make law sets as constraining as possible. But completeness is
not always achievable, and the system is honest about when it is not.
## Conformance levels
| **L1** | Parity | GPU matches CPU on all generated inputs. Probabilistic. |
| **L2** | Algebraic | All declared laws verified (exhaustive u8 + witnessed u32). Mathematical within domain. |
| **L3** | Engine | All engine invariants verified across generated programs. |
| **L4** | Full | L2 + L3 + compositional proofs + performance baselines. Production-ready. |
Each level is strictly stronger. L1 is the minimum. L4 is what NVIDIA needs
to ship a CUDA backend to customers.
## The flywheel
```
Better spec → easier to add backends → more backends → more users
→ more contributed ops → richer ecosystem → harder to not use vyre
→ vendors MUST contribute → they fund development → better spec
```
vyre-conform is the mechanism that makes this flywheel turn. Without
conformance, there is no trust. Without trust, there is no adoption. Without
adoption, there is no ecosystem. The conformance suite is not a testing tool.
It is the foundation of the entire project's viability.