vyre 0.4.0

GPU compute intermediate representation with a standard operation library
Documentation
# GPU DFA assembly pipeline

`vyre-std` ships the full pattern-set-to-GPU-DFA compilation pipeline
as five composable stages plus a composite entry point. Each stage is
a pure function; the final `PackedDfa` is POD bytes uploadable to a
`wgpu::Buffer`.

```text
regex source │                      Pattern::Literal      Pattern::Regex
                         │                     │
                         │  dfa_assemble escapes literal  │
                         └──────────────┬─────────────────┘
                                 combined regex
                            pattern::regex_to_nfa (Thompson)
                                       Nfa
                            pattern::nfa_to_dfa (subset construction,
                                                 dead-state pruning)
                                       Dfa
                            pattern::dfa_minimize (Hopcroft)
                                   minimal Dfa
                            pattern::dfa_pack (Dense or EquivClass)
                                   PackedDfa
```

## Stage 1 — `regex_to_nfa`

Thompson construction. Supports literal bytes, concatenation,
alternation `|`, Kleene star `*`, plus `+`, optional `?`, character
classes `[abc]` with ranges and negation, and escape via `\\`. Anchors
and backreferences are deliberately out of scope — those belong in a
regex frontend rather than in the compilation substrate.

Output is an `Nfa` with:

- `state_count: u32`
- `edges: Vec<NfaEdge>` where `byte: Option<u8>` (None = epsilon)
- `start: NfaStateId`
- `accept: Vec<bool>`

Errors surface as `PatternError::ParseError { offset, message }` with a
`Fix:`-prefixed explanation.

## Stage 2 — `nfa_to_dfa`

Subset construction. Walks epsilon closures, builds deterministic
transition rows keyed by byte, prunes dead states. Returns a `Dfa` with
a dense `transitions: Vec<u32>` laid out row-major at `state * 256 +
byte`. Out-of-table entries are `INVALID_STATE = u32::MAX`.

State-count ceiling is `MAX_DFA_STATES = 65_535`. Explosive closures
surface as `PatternError::StateOverflow`.

## Stage 3 — `dfa_minimize`

Hopcroft's partition refinement. Produces the canonical minimal DFA for
the input's accepted language. The output is idempotent:
`minimize(minimize(d)) == minimize(d)` bytewise after canonical state
renumbering. Conform declares `AlgebraicLaw::Idempotent` on this op.

## Stage 4 — `dfa_pack`

Transition-table compression with two format options.

**`DfaPackFormat::Dense`** — row-major `state × 256 × u32`. Fastest
scan, largest memory. Pick this when GPU memory is abundant relative to
state count.

**`DfaPackFormat::EquivClass`** — byte-equivalence classes collapse
redundant columns. Ships a 256-entry class table (padded to 4-byte
alignment) followed by `state × num_classes × u32`. Wins dramatically
when the effective alphabet is narrow (typical for regex patterns over
ASCII subsets).

Both formats round-trip through `dfa_unpack` back to a `Dfa` for
verification.

## Stage 5 — `dfa_assemble`

The composite entry point. Takes `&[Pattern<'_>]` (literal bytes or
regex source), escapes each literal, joins with `|`, runs the full
pipeline, and returns a `PackedDfa`. `AssembleOptions` picks the format
and whether to minimize (default: minimize).

## Layout invariants

- Literal bytes are escaped via a single-byte regex atom so they cannot
  accidentally be interpreted as metacharacters.
- Non-ASCII literal bytes use a character-class wrapper (`[<byte>]`) so
  the Thompson parser treats them as a single-byte atom.
- Empty pattern sets surface `PatternError::EmptyPatternSet`.
- Malformed regex surfaces `PatternError::ParseError` with the exact
  byte offset.

## Cache

`pattern::cache` wraps `dfa_assemble` with a content-addressed lookup:

```rust
use vyre_std::pattern::cache::load_or_compute;

let packed = load_or_compute(&patterns, options)?;
```

The key is `FNV-1a(CACHE_VERSION ⊕ options ⊕ patterns)`. Cache files
live under `${XDG_CACHE_HOME:-~/.cache}/vyre/dfa/`. `VYRE_NO_CACHE=1`
disables the cache entirely. `cache::clear()` and `cache::size()`
expose cache management to CLI callers.

## Aho-Corasick build

For pure literal-pattern sets where the user wants the classic
Aho-Corasick trie + failure links + output chain directly (not via
regex construction), `pattern::aho_corasick_build` exposes the CPU
reference and WGSL kernel. The output format is documented in the
module: `[state_count][pattern_count][goto_table][fail_links][out_head]
[out_link][pattern_lengths]`. The GOLDEN and KAT fixture vectors are
byte-diffed against this CPU reference on every test run.

## Consumer examples

- `std/examples/std_pattern_compile.rs` — three patterns → packed DFA →
  scan four inputs, runnable via
  `cargo run -p vyre-std --example std_pattern_compile`.
- `std/examples/std_arith_overflow.rs` — saturating vs wrapping at
  boundary cases.
- `std/benches/dfa_assemble_bench.rs` — measures pipeline throughput
  across pattern-set sizes and pack formats.