blazen-audio-codec
Neural-audio-codec backends for Blazen. Codecs translate raw PCM into discrete codebook tokens (and back) so generative models can operate in a low-rate, GPU-friendly token space instead of 24-48 kHz waveform samples.
This crate plays the same role for codecs that
blazen-audio-tts plays for TTS and
blazen-audio-music plays for music:
a single capability-typed [CodecBackend] trait plus a
monomorphizable [CodecBackendHandle<B>] and an erased
[DynCodecProvider] for binding layers.
Backends
| Backend | Feature flag | Status |
|---|---|---|
[backends::encodec] |
encodec |
Functional — Meta's EnCodec 24 kHz / 4-codebook neural codec via candle-transformers. |
[backends::dac] |
dac |
Functional decode — Descript Audio Codec (descript/dac_44khz, 9 codebooks at 8 kbps) via candle-transformers. Encode short-circuits until candle exposes a public RVQ encode path. |
[backends::snac] |
snac |
Functional — Multi-Scale Neural Audio Codec (hubertsiuzdak/snac_24khz, 3 multi-scale codebooks @ vq_strides [4, 2, 1], 24 kHz, ~3 kbps) via candle-transformers. Both encode and decode are wired. |
Why a dedicated codec trait?
Codecs are pure functions over PCM and tokens — no prompts, no
sampling temperature, no voices. Trying to thread them through the
generative AudioBackend surface would
force every codec to invent prompt semantics it doesn't have, so the
[CodecBackend] trait adds only encode_pcm / decode_tokens
and inherits the lifecycle methods (load / unload /
is_loaded) from [AudioBackend].
See PR_AUDIO_PLAN.md §3 + §5 W5 for the full restructure rationale.