vyre-conform 0.1.0

# Certification Levels

Conformance is tracked in **three parallel tracks** because integer,
strict-float, and approximate operations have different proof strategies:

- **Integer track** (L1, L2, L3, L4) — bit-exact matching against CPU
  reference and exhaustive/witnessed algebraic law verification.
- **Float track** (L1f, L2f, L3f, L4f) — bit-exact IEEE 754 matching,
  float-specific algebraic law verification, and tensor/engine
  verification.
- **Approximate track** (L1a) — ULP-tolerance matching for operations
  explicitly declared as approximate.

A backend certificate lists which tracks and which levels it has
achieved. Tracks are independent: a backend can be L4 integer but only
L2f float (e.g. it handles scalar integer perfectly but has not yet
implemented all tensor engines). Another backend can be L4f float but
L2 integer if its focus is ML inference. A minimal backend can be L1
integer only. All combinations are valid, all are reported explicitly.

**Tracks are never mixed within a single certificate entry.** A
backend cannot claim "L3" and silently use approximate paths. If an op
is strict, its certificate entry uses the strict track. If an op is
approximate, its entry uses the approximate track. No ambiguity.

Conformance levels themselves are permanent. Each level names a specific
strength of evidence, and each higher level within a track is strictly
stronger than the levels below it.

## Integer Track

## L1: Parity

L1 proves that the backend matches the CPU reference on generated inputs. The
suite dispatches operation shaders, compares GPU output bytes to CPU output
bytes, and reports every mismatch as a `ParityFailure`.

L1 is the minimum for any use. A backend that cannot pass parity on generated
inputs is not a vyre backend.

## L2: Algebraic

L2 includes L1 and verifies every declared algebraic law. Required law checking
uses exhaustive verification on the `u8` domain and witnessed verification on
the `u32` domain.

L2 proves that operations match point truth for generated inputs and satisfy
their declared mathematical structure under the required verification domains.

## L3: Engine

L3 includes L1 parity and engine conformance. It does not include L2
algebraic law verification. Every registered engine must pass every declared
invariant: deterministic output, no lost output, no duplicate output, ordered
output, bounded resources, termination, atomic linearizability, and workgroup
invariance where declared.

L3 proves that composed engines maintain their global execution guarantees
across generated valid programs and regression inputs.

## L4: Full

L4 includes L2 and L3, plus compositional verification and performance
baselines. It proves primitive structure, engine invariants, cross-operation
composition behavior, deterministic replay, regression replay, and usable
throughput under the conformance configuration.

L4 is what production backends need. A backend shipped to customers should be
able to publish an L4 certificate for the suite and operation versions it
claims to support.

## Float Track

The float track verifies floating-point operations under vyre's strict
IEEE 754 semantics (see `vyre/docs/ir/types.md` → "Float Semantics" and
`vyre-conform/SPEC.md` → "Float semantics — strict IEEE 754").

### L1f: Float Parity

L1f proves that the backend produces bit-exact IEEE 754 results for
every declared float op on generated inputs. The suite uses a correctly-
rounded software reference (CR-Math, CORE-MATH, or equivalent) and
compares byte-for-byte.

L1f is the minimum for any backend that claims to support float
operations. Backends must disable FMA fusion, disable flush-to-zero,
and use correctly-rounded transcendentals. Backends that cannot achieve
this on specific hardware must either emit software implementations or
decline to support those ops.

### L2f: Float Algebraic

L2f includes L1f and verifies float-specific algebraic laws where they
hold under rounding:

- `FAdd`, `FMul` are commutative (NOT associative — this is verified as
  a NEGATIVE law: `(a+b)+c != a+(b+c)` in general).
- `FNeg`, `FAbs` are involutions (`FNeg(FNeg(a)) = a` for all non-NaN).
- `FAdd(a, +0.0) = a` for all non-NaN.
- `FMul(a, 1.0) = a` for all non-NaN.
- `FMul(a, 0.0) = ±0.0 or NaN` per IEEE 754.

Float algebra is weaker than integer algebra (fewer universal laws hold
exactly under rounding) but the laws that do hold are verified bit-
exactly against the reference.

### L3f: Tensor

L3f includes L1f and verifies that tensor-level operations (matmul,
reduce, elementwise, softmax, layer norm, attention kernels) match a
reference implementation bit-for-bit. Tensor engines are tested on
generated valid programs with random inputs; any divergence from the
reference is a finding.

### L4f: ML Conformance

L4f includes L2f and L3f plus composed engines: full attention blocks,
full layer norm, full softmax with masking, mixed-precision matmul with
declared accumulator types. Backends that pass L4f can run real neural
networks with bit-exact determinism against the reference.

## Approximate Track

### L1a: Approximate Parity

L1a applies to operations explicitly declared as approximate
(`FReduceApprox`, `FSinApprox`, `RcpApprox`, etc.). Each op carries a
declared ULP tolerance. The suite verifies that GPU output is within
the declared tolerance of the reference for every generated input.

Approximate ops are a separate class with their own verification.
They are NEVER mixed with strict operations in a single certificate
entry. A backend that supports both publishes two entries: one in the
strict float track and one in the approximate track, with different
performance characteristics and different correctness guarantees.

## Strength Ordering

The ordering is strict within each track:

```text
Integer:     L1 < L2 < L3 < L4
Float:       L1f < L2f < L3f < L4f
Approximate: L1a (single level for now)
```

Within the integer track:
- L4 includes L2 and L3
- L3 includes L1 but not L2
- L2 includes L1

Same within the float track (L4f includes L2f and L3f; L3f includes
L1f but not L2f; L2f includes L1f).

Across tracks: tracks are independent. A backend with L4 integer and
L1f float is fully conforming on integer and minimally conforming on
float.

Passing a higher level carries the claims of the required lower
components within the same track. Failing any required component means
the backend has not achieved that level in that track.