Module jet_algebra

Expand description

The single shared Faà di Bruno / Leibniz combinatorial kernel (#1151).

Two jet representations live in this crate and historically each carried its own hand-written copy of the same calculus:

crate::jet_tower::Tower4 — full dense derivative tensors (v, g, h, t3, t4) in K primary variables, with the Leibniz product and Faà di Bruno composition written out term-by-term per derivative order.
crate::jet_partitions::MultiDirJet — bitmask-coefficient jet over distinct seeded directions, with the same two rules written as general submask / set-partition loops.

The data layouts are legitimately different (a complete small-K tower vs. a handful of directions of a large-K expression) and stay separate. What is identical is the combinatorics: for a group of differentiation slots, the Leibniz rule sums over subsets of those slots and the Faà di Bruno rule sums over their set-partitions. This module owns that combinatorics once, as a layout-agnostic [JetAlgebra] trait plus walkers parameterised by closures that read each representation’s own derivative for a slot-group. Both Tower4 and MultiDirJet route their mul / compose_unary through these walkers, so a fix to the rule is a fix to both — and a bit-exact equivalence test (see tests) proves the two layouts agree.

A “slot-group” is a list of positions 0..m (the differentiation arguments of one output coefficient). Each representation maps a group to a derivative:

For a tensor index tuple (i, j, k, l), the positions 0..m carry axis labels [i, j, k, l]; a sub-group of positions selects the corresponding lower-order tensor entry (e.g. positions {0, 2} → h[i][k]).
For a bitmask coefficient mask, the set bits are the slots; a sub-group of bits is itself a sub-mask read straight out of coeffs.

§Performance: the combinatorics is precomputed, not re-walked (#1151 perf)

The subset enumeration (Leibniz) and the set-partition enumeration (Faà di Bruno) over m slots are fixed combinatorial objects: they depend only on the slot count m, never on the actual derivative values. The hot per-row jet path nevertheless calls these walkers millions of times, and the original kernel rebuilt that structure from scratch on every call — the Faà di Bruno walk via a recursive &mut dyn FnMut “assign each element to a block” enumeration whose leaf and every block read went through a vtable that never inlined, plus a freshly cleared [SlotBuf] per block; the Leibniz walk via a per-bit branch building two [SlotBuf]s for every one of the 2^m subsets.

Both structures are now built ONCE per slot-count m (lazily, into a process-wide cache keyed by m ∈ 0..=8) and stored as flat, packed bitmask tables. The walkers iterate those tables with straight-line loops — no recursion, no &mut dyn FnMut dispatch, no per-call structure rebuild — so the only work left on the hot path is the actual arithmetic (the closure reads of derivatives and their products/sums). The emission order of the cached tables is, by construction, the EXACT order the former recursive / branch walkers produced (the table builder runs that same enumeration once), so every product is left-associated identically and every channel’s sum accumulates in the same order: the result is to_bits-identical to the former walkers, only with the combinatorial bookkeeping amortised away.

Functions§

faa_di_bruno: Walk the multivariate Faà di Bruno rule for an output of m slots.