Shared candle primitives for Diffusion Transformer (DiT) backends.
Wave 0 of the audio capability completion push extracted these
reusable building blocks out of the Stable Audio Open Small backend
in blazen-audio-music so other audio + 3D ports (F5-TTS,
AudioLDM, future video DiTs) can share them without copying code.
What lives here
- [
dit] — multi-head attention (self + cross) and the SwiGLU feed-forward block. - [
adaln] — AdaLN-Zero modulation (the six-chunk(scale_*, shift_*, gate_*)projection used to drive a DiT block from a global conditioning vector) and the broadcastingmodulatehelper. - [
rope] — Rotary Position Embedding helpers (precompute_rope_freqsand the partial-rotaryapply_rope) plus aFourierFeaturesblock for embedding continuous scalars.
What lives in the consuming backend
- Backend-specific configs (e.g.
DiTConfigfor Stable Audio), conditioner wiring, numeric / timestep embedding heads, output projections sized to specific latent shapes, and the actual DiT block recipe (e.g. self-attn → cross-attn → SwiGLU FFN withsigmoid(1 - gate)blending for Stable Audio).
These primitives are intentionally feature-free; consuming crates gate their own backends as needed.