1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
//! Tier 2.5 mathematical primitives.
//!
//! Each module exposes one reusable GPU composition with a stable op id.
//! Callers import the narrow module they need so region-chain audits can see
//! which primitive owns the shared work.
/// 1D separable convolution (domain-neutral: blur, signal processing, audio).
/// Shared dot-product partial accumulator.
/// Value-set analysis interval arithmetic.
/// Classical RK4 next-state combiner for ODE integration. Same Program
/// serves user-dialect neural-ODE / physics-flow callers AND vyre-self
/// substrate (#9 homotopy_continuation path-tracking).
/// Subgroup prefix-sum scan used by compaction, histograms, and reductions.
/// Differential-privacy accountant — Gaussian-mechanism RDP step with
/// host-side `(ε, δ)` conversion. Same Program serves user DP-SGD
/// trainers AND vyre's own profiler-telemetry hardening.
/// Fractional-calculus kernel — Grünwald-Letnikov weight generator
/// that feeds the existing `conv1d` primitive. No new GPU dispatch;
/// the lego rule is satisfied by composition.
/// Submodular greedy step — argmax-of-marginals primitive driving
/// (1 - 1/e)-approximation greedy maximization. Same Program serves
/// user active-learning / coreset / sensor-placement dialects AND
/// vyre-self compile-cache eviction as submodular coverage.
/// Conformal prediction — finite-sample distribution-free uncertainty
/// intervals. Same Program serves user calibrated-NN dialects AND
/// vyre-self dispatch cost-model intervals (#28).
/// Sinkhorn-Knopp scaling step for entropic optimal transport.
/// Composes with `semiring_gemm` for the matvec halves of the
/// iteration. User: OT/Wasserstein loss, distribution alignment;
/// vyre-self: dispatch-graph clustering via Sinkhorn-OT distance.
/// Full iterative Sinkhorn balance primitive.
/// Differentiable algorithm primitives — softmax + temperature-scaled
/// argmax. Same Programs serve user attention/structured-prediction
/// dialects AND vyre-self differentiable autotuner (#27).
/// Score-based generative one-step denoise combiner. User: diffusion /
/// flow-matching / SDE simulation.
/// KFAC block-diagonal inverse for natural gradient.
/// Newton-Schulz inverse-square-root step (Shampoo / KFAC core kernel).
/// Matrix preconditioner without SVD. User: Shampoo / KFAC / Sophia
/// optimizers, general matrix-function family.
/// Natural-gradient block-apply — multiply gradient by precomputed
/// `M^{-1/2}` block. Composes with #16 preconditioner pipeline.
/// Iterative hard thresholding for sparse signal recovery (#48).
/// User: compressed-sensing decoders, NN pruning, dictionary learning.
/// Self: vyre's sparse-buffer compaction (when output is mostly zero).
/// DP-SGD per-sample gradient clip (#42). User: DP-SGD trainers,
/// gradient-norm-clipped optimizers.
/// Mori-Zwanzig Markovian projection step — closed-form coarse-
/// graining of dynamical systems (#58). User: scientific ML
/// emulators. Self: vyre's coarse view of own dispatch graph.
/// Information-geometry primitives — Bhattacharyya / Fisher-Rao /
/// Amari α-connection (#57). User: distribution-aware loss design,
/// MoE routing.
/// Fast Multipole Method primitives — P2M / M2L / L2P (#51). User:
/// n-body simulations, kernel methods at scale, Poisson solvers.
/// Self: hierarchical compression of all-pairs dispatch dependency
/// analysis (#19 polyhedral fusion).
/// Algebraic Multigrid V-cycle Jacobi smoother step (#50). User:
/// Poisson / Laplace / diffusion solvers. Self: dispatch-graph
/// hierarchy levels match V-cycle levels.
/// Algebraic Multigrid V-cycle (#P-PRIM-3). User:
/// Poisson / Laplace / diffusion solvers. Self: dispatch-graph
/// hierarchy levels match V-cycle levels.
/// Sheaf Laplacian eigenvalue (#P-PRIM-9). User:
/// spectral clustering, heterophilic GNN. Self: spectral gap of
/// dispatch-graph sheaf Laplacian.
/// Full Edmonds augmenting-path matroid intersection (#P-PRIM-10).
/// User: combinatorial scheduling, bipartite matching. Self:
/// megakernel scheduler fusion-grouping.
/// Tensor-train decomposition via SVD-truncation per mode (#P-PRIM-12).
/// User: NN compression, long-context attention. Self: compress
/// the dispatch-graph cost tensor via TT decomposition.
/// Tensor-train one-step contraction (#6). User: NN compression,
/// long-context attention, scientific-tensor compression. Self:
/// vyre's chain-shaped Region tree as a TT — optimal contraction
/// order = optimal fusion order.
/// Randomized SVD random-projection step (#3). User: low-rank
/// attention, NN compression, PCA at scale. Self: dispatch dependency
/// matrix compression for #19 polyhedral fusion at workspace scale.
/// Sum-of-squares (Positivstellensatz) Gram-matrix construction (#14).
/// User: formal verification (Lyapunov), polynomial optimization,
/// SOS-based buffer-safety certificates.
/// Quantum singular-value transform (classical) block-encoding +
/// Chebyshev apply (#34). User: matrix-function family without
/// eigendecomposition. Self: Wasserstein-over-dispatch fusion.
/// Pairwise tensor-network contraction (#35). User: PEPS / MPS quantum
/// chemistry, compressed NN weights. Self: Region tree contraction.
/// RMT-based Marchenko-Pastur edge clip (#17). User: implicit
/// regularization, training-dynamics-aware optimizers. Self: spectrum
/// projection for #23 dispatch-graph spectral schedule.
/// p-adic Hensel-lift step (#54, research scaffold). Stable
/// arithmetic for ill-conditioned problems.
/// Multi-limb big-integer ripple-carry addition primitive (#P-PRIM-BIGINT).
/// Foundational building block for RSA / ECDSA / X25519 / lattice crypto.
/// Emits `(sum_partial, carry_partial)` per-limb for a downstream
/// parallel-prefix carry-fix wave. Same Program serves user crypto-dialect
/// callers AND vyre-self bigint-cost-model arithmetic.
/// Generic-semiring matrix multiply — spine of the LEGO substrate.
/// Same Program serves user dialects (security reachability, dataflow,
/// CKY parsing, Viterbi, GF(2)) AND vyre-self consumers (#19 fusion-graph
/// analysis, #22 megakernel scheduler critical path, #26 region-graph
/// dataflow fixpoint, #39 Scallop-join provenance semiring).
/// Bellman-Ford shortest path primitive over an edge list. Composes
/// `persistent_fixpoint`. Self-consumer: tensor-network contraction order.
/// Scallop-style probabilistic Datalog join (#39). Composes
/// `semiring_gemm` under `Lineage` with `persistent_fixpoint` —
/// one round of relational join per fixpoint step, run to convergence
/// inside ONE GPU dispatch. User dialect: probabilistic Datalog.
/// Self-consumer: rule-provenance tracking
/// (`vyre-libs::self_substrate::scallop_provenance`).
/// Prefix-scan backed stream compaction over live-lane flags.
/// SCC-local matrix fixpoint primitive for recursive graph components.