Expand description
Shared candle primitives for Diffusion Transformer (DiT) backends.
Wave 0 of the audio capability completion push extracted these
reusable building blocks out of the Stable Audio Open Small backend
in blazen-audio-music so other audio + 3D ports (F5-TTS,
AudioLDM, future video DiTs) can share them without copying code.
§What lives here
dit— multi-head attention (self + cross) and the SwiGLU feed-forward block.adaln— AdaLN-Zero modulation (the six-chunk(scale_*, shift_*, gate_*)projection used to drive a DiT block from a global conditioning vector) and the broadcastingmodulatehelper.rope— Rotary Position Embedding helpers (precompute_rope_freqsand the partial-rotaryapply_rope) plus aFourierFeaturesblock for embedding continuous scalars.
§What lives in the consuming backend
- Backend-specific configs (e.g.
DiTConfigfor Stable Audio), conditioner wiring, numeric / timestep embedding heads, output projections sized to specific latent shapes, and the actual DiT block recipe (e.g. self-attn → cross-attn → SwiGLU FFN withsigmoid(1 - gate)blending for Stable Audio).
These primitives are intentionally feature-free; consuming crates gate their own backends as needed.