Expand description
The single shared Faà di Bruno / Leibniz combinatorial kernel (#1151).
Two jet representations live in this crate and historically each carried its own hand-written copy of the same calculus:
crate::jet_tower::Tower4— full dense derivative tensors (v,g,h,t3,t4) inKprimary variables, with the Leibniz product and Faà di Bruno composition written out term-by-term per derivative order.crate::jet_partitions::MultiDirJet— bitmask-coefficient jet over distinct seeded directions, with the same two rules written as general submask / set-partition loops.
The data layouts are legitimately different (a complete small-K tower
vs. a handful of directions of a large-K expression) and stay separate.
What is identical is the combinatorics: for a group of differentiation
slots, the Leibniz rule sums over subsets of those slots and the Faà di
Bruno rule sums over their set-partitions. This module owns that
combinatorics once, as a layout-agnostic [JetAlgebra] trait plus walkers
parameterised by closures that read each representation’s own derivative
for a slot-group. Both
Tower4 and MultiDirJet route their mul / compose_unary through
these walkers, so a fix to the rule is a fix to both — and a bit-exact
equivalence test (see tests) proves the two layouts agree.
A “slot-group” is a list of positions 0..m (the differentiation
arguments of one output coefficient). Each representation maps a group to
a derivative:
- For a tensor index tuple
(i, j, k, l), the positions0..mcarry axis labels[i, j, k, l]; a sub-group of positions selects the corresponding lower-order tensor entry (e.g. positions{0, 2}→h[i][k]). - For a bitmask coefficient
mask, the set bits are the slots; a sub-group of bits is itself a sub-mask read straight out ofcoeffs.
§Performance: the combinatorics is precomputed, not re-walked (#1151 perf)
The subset enumeration (Leibniz) and the set-partition enumeration (Faà di
Bruno) over m slots are fixed combinatorial objects: they depend only
on the slot count m, never on the actual derivative values. The hot per-row
jet path nevertheless calls these walkers millions of times, and the original
kernel rebuilt that structure from scratch on every call — the Faà di Bruno
walk via a recursive &mut dyn FnMut “assign each element to a block”
enumeration whose leaf and every block read went through a vtable that never
inlined, plus a freshly cleared [SlotBuf] per block; the Leibniz walk via a
per-bit branch building two [SlotBuf]s for every one of the 2^m subsets.
Both structures are now built ONCE per slot-count m (lazily, into a
process-wide cache keyed by m ∈ 0..=8) and stored as flat, packed bitmask
tables. The walkers iterate those tables with straight-line loops — no
recursion, no &mut dyn FnMut dispatch, no per-call structure rebuild — so
the only work left on the hot path is the actual arithmetic (the closure
reads of derivatives and their products/sums). The emission order of the
cached tables is, by construction, the EXACT order the former recursive /
branch walkers produced (the table builder runs that same enumeration once),
so every product is left-associated identically and every channel’s sum
accumulates in the same order: the result is to_bits-identical to the
former walkers, only with the combinatorial bookkeeping amortised away.
Functions§
- faa_
di_ bruno - Walk the multivariate Faà di Bruno rule for an output of
mslots.