blazen-audio-core 0.6.75

Shared candle primitives (DiT attention, SwiGLU, AdaLN-Zero, RoPE, Fourier features) reused across Blazen audio + 3D diffusion backends
Documentation

Shared candle primitives for Diffusion Transformer (DiT) backends.

Wave 0 of the audio capability completion push extracted these reusable building blocks out of the Stable Audio Open Small backend in blazen-audio-music so other audio + 3D ports (F5-TTS, AudioLDM, future video DiTs) can share them without copying code.

What lives here

  • [dit] — multi-head attention (self + cross) and the SwiGLU feed-forward block.
  • [adaln] — AdaLN-Zero modulation (the six-chunk (scale_*, shift_*, gate_*) projection used to drive a DiT block from a global conditioning vector) and the broadcasting modulate helper.
  • [rope] — Rotary Position Embedding helpers (precompute_rope_freqs and the partial-rotary apply_rope) plus a FourierFeatures block for embedding continuous scalars.

What lives in the consuming backend

  • Backend-specific configs (e.g. DiTConfig for Stable Audio), conditioner wiring, numeric / timestep embedding heads, output projections sized to specific latent shapes, and the actual DiT block recipe (e.g. self-attn → cross-attn → SwiGLU FFN with sigmoid(1 - gate) blending for Stable Audio).

These primitives are intentionally feature-free; consuming crates gate their own backends as needed.