Skip to main content

Crate blazen_audio_core

Crate blazen_audio_core 

Source
Expand description

Shared candle primitives for Diffusion Transformer (DiT) backends.

Wave 0 of the audio capability completion push extracted these reusable building blocks out of the Stable Audio Open Small backend in blazen-audio-music so other audio + 3D ports (F5-TTS, AudioLDM, future video DiTs) can share them without copying code.

§What lives here

  • dit — multi-head attention (self + cross) and the SwiGLU feed-forward block.
  • adaln — AdaLN-Zero modulation (the six-chunk (scale_*, shift_*, gate_*) projection used to drive a DiT block from a global conditioning vector) and the broadcasting modulate helper.
  • rope — Rotary Position Embedding helpers (precompute_rope_freqs and the partial-rotary apply_rope) plus a FourierFeatures block for embedding continuous scalars.

§What lives in the consuming backend

  • Backend-specific configs (e.g. DiTConfig for Stable Audio), conditioner wiring, numeric / timestep embedding heads, output projections sized to specific latent shapes, and the actual DiT block recipe (e.g. self-attn → cross-attn → SwiGLU FFN with sigmoid(1 - gate) blending for Stable Audio).

These primitives are intentionally feature-free; consuming crates gate their own backends as needed.

Modules§

adaln
AdaLN-Zero modulation primitives.
dit
Generic DiT primitives: multi-head attention (self + cross) and a SwiGLU feed-forward block.
rope
Rotary Position Embedding (RoPE) primitives + Fourier-feature embedding for continuous scalars.