vyre 0.4.0

GPU compute intermediate representation with a standard operation library
Documentation
# Dataflow Engine

## Overview

The dataflow engine runs graph fixpoint computations on the GPU. Its primary
consumer is taint analysis: start from source nodes, propagate facts through a
control/data-flow graph, and report reached sink nodes.

## CSR Graph Format

Graphs use compressed sparse row storage:

```text
offsets: len node_count + 1
targets: len edge_count
node_data: len node_count
neighbors(node) = targets[offsets[node] .. offsets[node + 1])
```

`offsets` must be nondecreasing, `offsets[0] = 0`, and
`offsets[node_count] = edge_count`. Every target must be `< node_count`.

## Multi-Source BFS

The standard reachability engine runs one frontier per source. Visited state is a
bitmap with `words_per_source = ceil(node_count / 32)`.

```text
visited[source_idx][node / 32] bit node%32
```

A source starts with its source node visited. Traversal expands edges until the
frontier is empty or `max_depth` is reached.

## Level-Synchronous Iteration

The portable execution model is level-synchronous:

1. Process the current frontier.
2. Apply the transfer function to candidate edges.
3. Atomically mark newly reached nodes in the next frontier.
4. Detect whether any new node was added.
5. Stop when no additions occur or the configured depth bound is reached.

Global synchronization between levels requires separate dispatches or a backend
runtime loop. A single dispatch may process bounded depth only when the generated
program can prove workgroup-local synchronization is sufficient.

## Convergence Detection

Convergence is represented by an atomic changed flag or frontier count. Each
newly visited node atomically sets `changed = 1` or increments the next-frontier
count. When the host observes zero changes after a level, the fixpoint is
complete.

## Transfer Function

The transfer function decides whether a fact flows across an edge. It is
pluggable through an `Op` and receives at minimum:

```text
source_id, current_node, target_node, edge_or_node_metadata
```

It returns `0` to block propagation or `1` to allow propagation. More advanced
transfer functions may return a transformed fact word.

Community-provided TOML rules may define source labels, sink labels, sanitizer
patterns, and transfer-function parameters. TOML affects inputs to the engine; it
does not change IR semantics.

## Taint Analysis Contract

For taint analysis, `node_data` encodes node role:

```text
0 = normal
1 = source
2 = sink
3 = source and sink
```

The engine reports `(source, sink, depth)` for every sink reached from a source
within `max_depth`. Output order must be deterministic after readback sorting.

## Validation

Reject malformed CSR arrays, source IDs outside the graph, `source_count *
node_count` overflow, output capacity overflow, and resource sizes that exceed
backend limits.