blazen-audio-codec 0.6.70

Neural audio-codec backends for Blazen — encode PCM to discrete codebook tokens and back for generative audio models
Documentation

blazen-audio-codec

Neural-audio-codec backends for Blazen. Codecs translate raw PCM into discrete codebook tokens (and back) so generative models can operate in a low-rate, GPU-friendly token space instead of 24-48 kHz waveform samples.

This crate plays the same role for codecs that blazen-audio-tts plays for TTS and blazen-audio-music plays for music: a single capability-typed [CodecBackend] trait plus a monomorphizable [CodecBackendHandle<B>] and an erased [DynCodecProvider] for binding layers.

Backends

Backend Feature flag Status
[backends::encodec] encodec Functional — Meta's EnCodec 24 kHz / 4-codebook neural codec via candle-transformers.
[backends::dac] dac Functional decode — Descript Audio Codec (descript/dac_44khz, 9 codebooks at 8 kbps) via candle-transformers. Encode short-circuits until candle exposes a public RVQ encode path.
[backends::snac] snac Functional — Multi-Scale Neural Audio Codec (hubertsiuzdak/snac_24khz, 3 multi-scale codebooks @ vq_strides [4, 2, 1], 24 kHz, ~3 kbps) via candle-transformers. Both encode and decode are wired.

Why a dedicated codec trait?

Codecs are pure functions over PCM and tokens — no prompts, no sampling temperature, no voices. Trying to thread them through the generative AudioBackend surface would force every codec to invent prompt semantics it doesn't have, so the [CodecBackend] trait adds only encode_pcm / decode_tokens and inherits the lifecycle methods (load / unload / is_loaded) from [AudioBackend].

See PR_AUDIO_PLAN.md §3 + §5 W5 for the full restructure rationale.