vyre 0.4.0

GPU compute intermediate representation with a standard operation library
Documentation
# The vyre Testing Book

This book is the complete and authoritative guide to testing vyre. It is
written to last: the architecture it describes is a permanent decision,
not a checkpoint in an evolving design. When vyre ships a conformance
certificate to a backend vendor, the certificate is valid because this
book describes the discipline that made it valid. When a contributor adds
a primitive operation ten years from now, the test they write belongs to
the categories this book defines, obeys the oracle rules this book
enforces, and survives the mutation gate this book mandates. If this book
changes, it is because vyre's testing architecture itself has changed —
which should happen roughly never.

## Preface

### Why this book exists

vyre's value to the world is not what it does today. vyre's value is that
a `Program` written against its specification produces byte-identical
results on every conformant backend, today and for every future version,
forever. That is the contract. Without that contract, vyre is one more
GPU compute framework in a graveyard of frameworks that could not
guarantee their own semantics. With that contract, vyre is infrastructure
— the kind of thing NVIDIA pays engineers to support because the cost of
not supporting it is higher than the cost of contributing.

The contract is only real if the test suite is strong enough to enforce
it. A specification without a conformance test is a wish. A conformance
test without adversarial rigor is a ritual. A test suite without
discipline is theater. vyre cannot afford any of these. Every
determinism violation, every backend disagreement, every lowering bug,
every missed validation rule is a failure of the contract. The test
suite exists to prevent those failures, and this book exists to teach you
how to write tests that actually prevent them.

The standard we measure ourselves against is not other GPU compute
frameworks. It is SQLite, which ships with 590 times more test code than
source code and catches crashes before they reach a user. It is the
Linux kernel, which runs kselftest, syzkaller, KASAN, KCSAN, and lockdep
on every commit and treats a single crash as a P0 bug. It is LLVM,
whose test infrastructure is the reason compiler backends can be written
by vendors without central coordination. These projects are the way
they are because their tests are first-class engineering artifacts. A
sloppy test in SQLite is a crisis, not a nit. A flaky test in Linux is
reverted, not documented. This book exists so that every test in vyre
carries that same weight.

### What this book is not

This book is not a tutorial on testing in general. It assumes you know
what a unit test is, how `#[test]` works in Rust, and what `cargo test`
does. If you are new to Rust testing, read the Rust Book's testing
chapter first.

This book is not vyre's reference manual. It does not explain what
`ir::Program` looks like or how `lower::wgsl` works internally — those
questions are answered by the IR and lowering docs. This book answers a
different question: given that those subsystems exist, how do we prove
they are correct, and how do we prove they stay correct forever.

This book is not a style guide for Rust code. vyre's conventions doc
handles that. When this book and the conventions doc disagree about code
style, the conventions doc wins. When they disagree about tests, this
book wins.

### Who this book is for

Every vyre contributor, human or agent. Every backend implementor. Every
reviewer. Every engineer who receives a bug report against vyre and has
to decide whether the bug is in vyre or in the user's program. Every
company evaluating vyre for production use who wants to know whether the
conformance guarantees are real.

If you are about to write your first test for vyre, this book teaches
you how to write one that meets the bar. If you have written thousands,
this book is the place you return when a test feels wrong but you cannot
say why. If you are reviewing a pull request, this book is the authority
you cite when you reject a test that passes but does not actually verify
anything.

### A note on tone

vyre's documentation is opinionated on purpose. We tell you what to do,
we explain why, and we tell you what not to do. We do not hedge. If a
rule in this book seems wrong in a specific case, you almost certainly
misunderstand the rule or the case. In the rare case where the rule is
actually wrong, the correct response is a pull request against this book
before you write the offending test. We would rather debate the rule
than quietly erode it.

The tone is inherited from the rest of vyre's docs, which are inherited
from projects we respect. The Linux kernel's coding style does not
apologize for existing. The LLVM coding standard does not begin with
"please consider." The SQLite documentation tells you what SQLite does
and what it does not do. This book is written in the same voice because
the subject matter deserves it.

## Table of contents

### Part I — Why testing vyre is hard

1. [Introduction]introduction.md — vyre's promise to its users, why that
   promise is hard to keep, and what a test suite has to prove for the
   promise to be real.
2. [A tour of what can go wrong]a-tour-of-what-can-go-wrong.md — the
   failure modes vyre's test suite must prevent. Miscompilation,
   nondeterminism, backend drift, composition bugs, validation gaps,
   float nondeterminism, regression. Each failure mode is a class of bug
   the suite is specifically designed to catch.
3. [The promises]the-promises.md — vyre's fifteen invariants, expressed
   as promises to users rather than as tabulated rules. These are the
   things the suite proves. Every test ultimately exists to keep one of
   these promises.

### Part II — The language of testing in vyre

4. [Vocabulary]vocabulary.md — what we mean by "test," "oracle,"
   "property," "mutation," "archetype," "determinism," and the other
   terms of art used throughout the book. Every term defined here has a
   single precise meaning across the entire vyre project.
5. [Oracles]oracles.md — the oracle hierarchy, from strongest to
   weakest. When to use each, why tests never derive their expected
   output from the code under test, and how oracle selection is
   mechanical rather than judgmental.
6. [Mutations and the adversarial mindset]mutations.md — what a
   mutation is, how mutation testing grades a test suite, and why "my
   test passes on correct code" is not the same as "my test catches
   wrong code." The chapter that teaches you to write tests that are
   worth writing.
7. [Archetypes — the shapes of bad inputs]archetypes.md — the
   catalog of input shapes that expose bugs. Identity pairs, overflow
   boundaries, bit-pattern alternations, resource bombs, diamond
   dataflows, off-by-one indices. You do not have to invent adversarial
   inputs from scratch. The archetype catalog is the inventory.

### Part III — The test suite architecture

8. [Architecture]architecture.md — the directory layout of
   `vyre/tests/`, what each category is for, and the rationale behind
   each boundary. The reference chapter you will open most often.
9. [Unit tests]categories/unit.md — fast isolated tests with no
   cross-module dependencies. Why we prefer inline `#[cfg(test)]`
   modules and when to reach for the `tests/unit/` directory instead.
10. [Integration tests]categories/integration.md — tests that exercise
    the complete `ir::Program` → validate → lower → dispatch path. The
    bulk of hand-written correctness coverage.
11. [Validation tests]categories/validation.md — the V001–V020 suite.
    One test per rule, independently triggerable, must-reject and
    must-accept pairs.
12. [Lowering tests]categories/lowering.md — covering every `Expr`,
    `Node`, `BinOp`, `UnOp`, `AtomicOp`, and `BufferAccess` variant.
    Includes the enum-exhaustiveness meta-test that refuses to compile
    when a new variant is added without coverage.
13. [Wire format tests]categories/wire_format.md — wire format ↔ IR conversion,
    round-trip identity, opcode coverage.
14. [Adversarial tests]categories/adversarial.md — hostile inputs,
    resource bombs, malformed IR, malformed wire-format bytes, OOM and fault
    injection. The category where "graceful rejection, no panic" is the
    whole assertion.
15. [Property tests]categories/property.md — proptest-based
    invariants, generator discipline, seed reproducibility, and how to
    write a generator that actually exercises the shape of
    `ir::Program` rather than its flat surface.
16. [Backend tests]categories/backend.md — cross-backend equivalence,
    determinism across runs, the skip rule when only one backend is
    registered.
17. [Regression tests]categories/regression.md — permanent
    reproducers for fixed bugs. The one category where a test, once
    committed, is never deleted.
18. [Benchmarks]categories/benchmarks.md — criterion-based performance
    gates and the regression policy. Benchmarks are not correctness
    tests and do not substitute for them.
19. [Support utilities]categories/support.md — what belongs in
    `tests/support/` and what does not. The rule that helpers exist to
    reduce boilerplate, not to obscure test intent.

### Part IV — A worked example: testing BinOp::Add

This part walks through the design and construction of the complete
test set for `BinOp::Add` end to end. It is the chapter you will copy
from when you add your first primitive op. Every decision is
explained; every test is justified.

20. [Starting with intent]worked-example/01-intent.md — what does
    `Add` mean, and what must be true about it?
21. [The first test]worked-example/02-first-test.md — writing
    `test_add_identity_zero_spec_table` line by line, explaining every
    decision.
22. [Building out the suite]worked-example/03-building-out.md — the
    rest of the canonical add test set, and why each test exists.
23. [Catching a deliberate bug]worked-example/04-catching-a-bug.md
    — injecting an off-by-one into the implementation and watching the
    suite fire.
24. [Running the mutation gate]worked-example/05-mutation-gate.md    iterating tests until the gate reports zero survivors, and what to
    do when a mutation refuses to die.

### Part V — Writing tests

25. [The decision tree]writing/decision-tree.md — the nine questions
    that place any new test in the correct category. When you are about
    to write a test and do not know where it goes, answer these in
    order.
26. [Templates]writing/templates.md — canonical skeletons for the
    most common test shapes. Copy, fill in, commit.
27. [Naming]writing/naming.md — the `test_<subject>_<property>`
    convention, why consistency is load-bearing, and the naming rules
    that make `cargo test <substring>` useful.
28. [Support utilities]writing/support-utilities.md — writing helpers
    that clarify rather than obscure. The tests that read as one
    statement with no indirection, even when the helper is necessary.
29. [Oracles in practice]writing/oracles-in-practice.md — picking the
    right oracle for the test you are writing, with worked examples
    across every category.

### Part VI — What not to write

30. [Anti-patterns]anti-patterns/README.md — the shapes that look
    like tests but are not. Each anti-pattern has its own chapter.
31. [The tautology test]anti-patterns/tautology.md
32. [The kitchen sink test]anti-patterns/kitchen-sink.md
33. [The "doesn't crash" test]anti-patterns/doesnt-crash.md
34. [The hidden-helper test]anti-patterns/hidden-helpers.md
35. [The seedless proptest]anti-patterns/seedless-proptest.md
36. [Test smells]anti-patterns/test-smells.md — the subtler warning
    signs that appear before a test becomes an outright anti-pattern.
    What to look for and what to do.

### Part VII — Discipline

37. [The review checklist]discipline/review-checklist.md — the
    eleven items a reviewer enforces on every PR that touches a test.
    No exceptions.
38. [The daily audit]discipline/daily-audit.md — read ten random
    committed tests every day, delete any that fail the checklist. The
    calibration loop that keeps the suite honest.
39. [Seed discipline]discipline/seed-discipline.md — proptest seed
    management, regression corpus handling, making CI failures
    reproducible a year later.
40. [The regression rule]discipline/regression-rule.md — once a file
    lands in `tests/regression/`, it never leaves. When such a file
    starts failing, the bug has returned. Fix the code, not the test.
41. [Flakiness]discipline/flakiness.md — the most corrosive failure
    mode of a test suite. How to detect a flake, how to eliminate it,
    and why "flake tolerance" is the beginning of the end.
42. [Suite performance]discipline/suite-performance.md — a slow
    suite does not get run. The rules for keeping the suite fast
    enough that running it before every commit is reflex.

### Part VIII — Advanced topics

43. [Property-based testing for GPU IR]advanced/property-generators.md
    — writing generators that produce valid `ir::Program` values with
    enough structure to exercise real bugs. Shrinking, coverage-guided
    generation, and the traps of naive random input.
44. [Differential fuzzing]advanced/differential-fuzzing.md — vyre
    against itself, vyre against its reference interpreter, vyre
    against upstream GPU compute implementations. The most powerful
    bug-finding technique for cross-backend semantics.
45. [Mutation testing at scale]advanced/mutation-at-scale.md    scoping `cargo-mutants`, interpreting surviving mutations as
    findings, the cache discipline that keeps mutation runs under ten
    seconds per op.
46. [Concurrency and ordering]advanced/concurrency-and-ordering.md    atomics, barriers, memory ordering, and the stress tests that
    catch ordering bugs the compiler and hardware cooperate to hide.
47. [Floating-point]advanced/floating-point.md — testing strict IEEE
    754 paths, ULP-bounded approximate tracks, and the rules that
    prevent float semantics from silently drifting across backends.
48. [Cross-backend equivalence in practice]advanced/cross-backend.md
    — the hard cases: platform-specific rounding, atomic ordering
    differences, workgroup size variation, and how the suite catches
    divergences before users do.

### Part IX — Running the suite

49. [Local workflow]running/local-workflow.md`cargo test`, `cargo
    test --ignored`, `cargo bench`, `cargo fuzz run`, and how to run
    just the tests relevant to your change.
50. [Continuous integration]running/continuous-integration.md — what
    CI runs on every commit, what runs on release, what runs nightly,
    and how failures are triaged.
51. [Debugging failures]running/debugging-failures.md — when a test
    fails in your PR: read the failure, find the minimal reproducer,
    decide whether the code or the test is wrong, fix.
52. [Debugging flakes]running/debugging-flakes.md — the harder
    triage. How to tell a flake from a real bug, how to make a flake
    reproduce, and why you do not "just re-run CI."

### Part X — Integration with vyre-conform

53. [The two-tier suite]vyre-conform/two-tier-suite.md — how vyre's
    hand-written suite and vyre-conform's generated suite fit together
    as a single coherent test corpus.
54. [When the generator supersedes you]vyre-conform/generator-supersession.md
    — the rule for migrating an op's hand-written tests to the
    generated suite, and the proof standard that must be met first.
55. [What the generator will never replace]vyre-conform/never-replaced.md
    — regression reproducers, category-specific adversarial tests,
    property invariants, benchmarks. The chapters that remain
    hand-written forever.
56. [Contribution flow]vyre-conform/contribution-flow.md — the path a
    test takes from proposal to committed artifact, whether the author
    is human, agent, or mixed.

### Part XI — Meta

57. [Testing as design]meta/testing-as-design.md — how writing tests
    first shapes the code you write next. Not a religious argument
    about TDD; a practical observation that the ops with the best
    tests are the ops that were designed alongside their tests.
58. [Testing the testers]meta/testing-the-testers.md — how we know
    our tests are good. Mutation score, coverage score, the audit
    rate, and the meta-tests that catch classes of test bugs.
59. [Post-mortem discipline]meta/post-mortem-discipline.md — when a
    bug reaches production, the test suite failed. Every post-mortem
    adds at least one mutation operator or archetype. The catalog
    grows with the project.
60. [The long game]meta/the-long-game.md — the closing chapter. Why
    this book exists and what we expect of it in five, ten, twenty
    years.

### Appendices

- [A. Glossary]appendices/A-glossary.md — every term used in this
  book, cross-referenced to vyre's main glossary.
- [B. Invariants catalog]appendices/B-invariants-catalog.md — the
  fifteen invariants in full, with test-family references.
- [C. Mutation operator reference]appendices/C-mutation-operators.md
  — the complete mutation catalog used by the mutation gate.
- [D. Archetype reference]appendices/D-archetypes.md — the complete
  archetype catalog.
- [E. Command reference]appendices/E-command-reference.md — every
  `cargo` command relevant to testing, with flags and when to use
  each.
- [F. Review checklist]appendices/F-review-checklist.md — the
  enforceable eleven-item checklist, printable.
- [G. Examples index]appendices/G-examples-index.md — every worked
  example in the book, cross-referenced by topic and category.
- [H. Change history]appendices/H-change-history.md — when each
  chapter was last revised and why. The architectural decisions
  recorded here are permanent; the revisions that matter are ones
  that refine explanation, not ones that change the rules.

## How to read this book

If you are about to write your first vyre test, read Part I, Part II,
the relevant chapter from Part III, and Part IV. That is roughly fifty
pages and will take an afternoon. You will emerge knowing exactly what
to do.

If you are experienced with vyre and you want to know where a specific
kind of test belongs, go directly to [the decision tree](writing/decision-tree.md).
It answers the question in under a minute.

If you are reviewing a pull request that touches tests, [the review
checklist](discipline/review-checklist.md) is the authority that
decides whether the PR merges.

If you are debugging a failing test in your own PR, start with
[debugging failures](running/debugging-failures.md). If the failure
seems to depend on timing or thread count, continue to [debugging
flakes](running/debugging-flakes.md).

If you are writing a new backend and want to know what conformance
means, Part VII and [cross-backend equivalence in practice](advanced/cross-backend.md)
are the chapters you need. Part X explains how your backend
interacts with vyre-conform.

## Reading this book as an agent

Agents contributing to vyre read this book for the same reason humans
do: to understand what a correct test looks like. The book is written
to be readable by agents without special accommodation. The anti-patterns
in Part VI are specifically the shapes agents produce most often when
asked for tests without context, which is why the book treats them with
the severity it does.

An agent reading this book should pay particular attention to
[oracles](oracles.md), [anti-patterns](anti-patterns/README.md), and
[the review checklist](discipline/review-checklist.md). Those chapters
encode the gates the agent's output will be judged against.

## Cross-references

Every chapter in this book links to the specific vyre sources and
invariants it depends on. Every invariant mentioned is defined in
[Appendix B](appendices/B-invariants-catalog.md) and cross-referenced
throughout. Every term is defined in [Appendix A](appendices/A-glossary.md)
or in vyre's main glossary. When this book talks about a specific
module, it uses `crate::path::to::module` notation, and you can navigate
directly.

If you find a broken cross-reference, file it as a bug against this
book.