vyre-conform 0.1.0

# vyre-conform competitors

This document is for scope control. It says where existing tools are stronger,
where they are aimed at a different problem, and where vyre-conform must not
pretend to be more mature than it is.

vyre-conform is a conformance suite for vyre, a GPU compute substrate. Its core
claim is narrow: for each registered operation, a backend must match the
bit-exact CPU reference, satisfy declared algebraic laws, survive an enforcement
gauntlet, and produce replayable evidence. It is not a general formal methods
platform, a database certification suite, or a universal fuzzer.

## TLA+ and Apalache

What they do well:

- Model protocols, state machines, temporal properties, and concurrency at a
  higher level than implementation code.
- Find impossible-to-see design bugs before Rust, WGSL, or backend code exists.
- Apalache brings bounded model checking to TLA+ specs and can produce concrete
  counterexamples for many classes of state-space bugs.

Where they lose for vyre:

- They do not execute vyre kernels or compare vendor GPU outputs byte-for-byte.
- They do not know the concrete Rust `cpu_fn`, WGSL lowering, buffer layout, or
  shader validation path unless a maintainer re-encodes those semantics.
- They can prove a model and still miss a bad implementation, a wrong endian
  encoding, a bad adapter behavior, or a conformance harness bug.

What conform does differently:

- Treats the CPU reference as executable truth and compares backends against it.
- Runs law checks, parity checks, mutation probes, fuzz reproducers, and
  regression replay in the same repository that owns the operation registry.
- Emits contributor-facing failures with concrete op ids and fix paths.

What conform does not attempt:

- It does not replace temporal model checking.
- It does not prove liveness, fairness, distributed consensus, or unbounded
  state-machine properties.
- It does not claim a checked Rust/WGSL implementation is correct for inputs
  outside the declared reference, law, corpus, and generator surface.

## cargo-mutants

What it does well:

- Mutates Rust code and tells you whether the existing tests noticed.
- Finds weak assertions, dead branches, accidentally untested code paths, and
  overfit tests.
- Fits normal Rust projects without requiring a new specification language.

Where it loses for vyre:

- It does not understand GPU semantics, WGSL, backend parity, or algebraic laws.
- It can say a Rust test suite is weak, but it cannot decide whether
  `primitive.bitwise.rotl` obeys the registered semantic contract.
- It does not verify that a CPU reference is a trustworthy oracle rather than a
  plausible-looking implementation.

What conform does differently:

- Uses mutation testing as one layer, not the definition of conformance.
- Connects surviving mutants to op ids, law declarations, parity inputs,
  defender TOML entries, and reference-trust findings.
- Includes WGSL mutation probes so shader changes can be killed by CPU/backend
  parity, not only by Rust unit tests.

What conform does not attempt:

- It does not replace cargo-mutants for broad Rust crate mutation coverage.
- It does not claim every mutation operator is modeled; mutation classes must be
  added deliberately and kept relevant to vyre semantics.

## Coq, ProofPower, and Lean

What they do well:

- Prove theorems with machine-checked rigor.
- Support deep mathematical development, reusable libraries, and precise proof
  terms.
- Can establish facts far beyond what testing can sample.

Where they lose for vyre:

- They require a formalization effort that is larger than many operations.
- A theorem about an abstract function does not certify a vendor GPU runtime, a
  WGSL compiler, a buffer ABI, or a Rust reference implementation.
- They are not a friendly first contribution path for someone adding a TOML law,
  boundary witness, or one small operation.

What conform does differently:

- Accepts executable evidence: CPU references, algebraic law probes, parity
  sweeps, adversarial corpora, fuzz reproductions, and certificates.
- Keeps contributors in Rust, WGSL, TOML, and cargo commands.
- Uses laws where they buy leverage, then falls back to bit-exact references
  where theorem proving would be disproportionate.

What conform does not attempt:

- It does not provide proof terms accepted by Coq, Lean, or ProofPower.
- It does not prove arbitrary theorems about all possible programs.
- It does not turn sampling into mathematical certainty for large domains.

## LLVM verifier and bugpoint

What they do well:

- The LLVM verifier catches malformed IR and broken structural invariants.
- LLVM's regression culture is strong: small IR files, reduced reproducers, and
  exact expected behavior are normal.
- bugpoint-style reduction is valuable when a large failing input needs to be
  shrunk to the smallest useful case.

Where they lose for vyre:

- The LLVM verifier does not know vyre's GPU operation contracts, algebraic law
  registry, or CPU oracle hierarchy.
- A structurally valid IR can still be semantically wrong for a GPU backend.
- bugpoint reduces compiler failures; it does not define the semantic reference
  for `vyre` operations.

What conform does differently:

- Starts from operation semantics, not only IR well-formedness.
- Requires CPU reference agreement, backend parity, law checks, adversarial
  witnesses, and regression replay.
- Wants reduced reproducers, but only as evidence attached to a conformance
  failure.

What conform does not attempt:

- It is not a general LLVM IR verifier.
- It does not replace compiler reduction tools for non-vyre compiler bugs.
- It does not accept "valid IR" as a synonym for "correct computation."

## SQLite TH3

What it does well:

- SQLite TH3 is a serious proprietary conformance suite with an enormous
  test-to-code ratio. SQLite's public TH3 page describes a test suite roughly
  590 times the size of the SQLite library source.
- It maps tests to requirements and supports deep regression confidence for a
  small, high-value core.
- It is built around the discipline that every old bug becomes permanent test
  evidence.

Where it loses for vyre:

- It is focused on SQLite, not GPU kernels, WGSL, algebraic laws, or backend
  parity.
- Its scale is a benchmark, not something vyre-conform can honestly claim.
- vyre-conform is not close to a 590x ratio. In this repository, the target
  stated in the README has been 3x-10x, and the current practical ratio is much
  closer to a low single-digit multiple than to TH3.

What conform does differently:

- Uses CPU/GPU parity, law inference, WGSL mutation, fuzz reproducers, TOML
  rules, and operation certificates instead of SQL requirement coverage.
- Makes community extension possible through checked-in TOML rules and witness
  files, not only proprietary test infrastructure.

What conform does not attempt:

- It does not claim TH3-level scale or maturity.
- It does not certify database behavior, SQL semantics, transaction isolation,
  or filesystem crash recovery.
- It does not hide the gap: TH3 is the scale reference, not a peer.

Reference: https://www.sqlite.org/th3.html

## Kani, Prusti, and Creusot

What they do well:

- Bring formal verification closer to Rust code.
- Kani can bounded-model-check Rust with concrete counterexamples.
- Prusti and Creusot let developers state contracts and prove properties over
  Rust functions with verifier support.

Where they lose for vyre:

- They do not run GPU adapters or prove vendor backend parity.
- They do not automatically understand WGSL, naga validation, buffer ABIs, or
  vyre's operation registry.
- They require annotation and proof discipline that is too heavy for every
  community witness or TOML rule.

What conform does differently:

- Uses Rust verification-like discipline at the conformance boundary: total CPU
  references, explicit laws, generated probes, and no silent fallback.
- Verifies the cross-language path from Rust reference to GPU lowering output.
- Treats bounded proof tools as possible inputs, not as the whole suite.

What conform does not attempt:

- It does not prove Rust memory safety beyond what Rust and unsafe audits cover.
- It does not replace Kani, Prusti, or Creusot for function-level proofs.
- It does not verify arbitrary Rust crates.

## syzkaller, AFL, and libFuzzer

What they do well:

- Generate hostile inputs continuously and cheaply.
- Produce minimized reproducers that turn unknown crashes into stable tests.
- syzkaller is especially strong at stateful kernel API exploration; AFL and
  libFuzzer are workhorses for parser and input-format bugs.

Where they lose for vyre:

- Fuzzing finds crashes and differential failures; it does not define the
  semantic truth of an operation.
- Passing a fuzz campaign does not prove a law, a backend contract, or a
  bit-exact CPU/GPU match.
- Fuzzers are only as useful as their harnesses and oracles. A bad oracle makes
  a fast fuzzer confidently wrong.

What conform does differently:

- Uses fuzzing as one source of adversarial witnesses.
- Replays fuzz findings through named operation references, law checks, and
  parity harnesses.
- Keeps corpus entries append-only so old failures remain part of the gate.

What conform does not attempt:

- It does not replace long-running fuzz farms.
- It does not claim exhaustive coverage from fuzz input volume.
- It does not accept crash-free execution as evidence of semantic correctness.

## Jepsen

What it does well:

- Tests distributed systems under partitions, clock faults, process death, and
  workload histories.
- Finds bugs in systems that already passed unit tests, integration tests, and
  vendor certification.
- Produces compelling histories because it tests real deployments under hostile
  schedules.

Where it loses for vyre:

- vyre kernels are not distributed databases.
- Jepsen does not validate shader semantics, algebraic laws, operation ABIs, or
  bit-exact GPU output.
- It is intentionally system-level and workload-level, while vyre-conform is an
  operation and backend conformance gate.

What conform does differently:

- Tests deterministic computation contracts instead of distributed consistency
  models.
- Targets per-op and per-backend evidence that can run in normal cargo CI.
- Includes OOM injection and crash-style adversarial harnesses where those
  affect local computation correctness.

What conform does not attempt:

- It does not test linearizability, serializability, partitions, leader
  election, or cluster recovery.
- It does not replace Jepsen for any distributed system built on top of vyre.

## Bottom line

Use vyre-conform when the question is: "Does this vyre operation and backend
produce the exact reference bytes, satisfy the declared laws, and survive the
registered conformance gauntlet?"

Use the other tools when their question is the real question. A TLA+ model, a
Lean theorem, a Kani proof, a libFuzzer campaign, an LLVM reducer, and a Jepsen
history can all be stronger than conform for their own jobs. Conform's job is
the GPU operation contract. It should interoperate with those tools, not claim
to make them obsolete.