kithara-decode
Audio decoding library with explicit, typed backend selection. DecoderFactory creates synchronous Decoder instances that convert compressed audio (MP3, AAC, FLAC, WAV, ALAC, …) into pool-backed PCM (Vec<f32>). No threading, no channels — just decoding.
The public surface centres on one trait — Decoder. Concrete backends (Symphonia / Apple / Android) implement it directly. Internally, container parsing and frame decoding are split: the Demuxer trait owns container framing, the FrameCodec trait owns codec decoding, and ComposedDecoder<D, C> (internal) pairs them so backends can be mixed and matched. The factory hides this detail — callers only ever see Box<dyn Decoder>.
Usage
use Cursor;
use ;
let reader = new;
let config = DecoderConfig ;
let mut decoder = create_with_probe?;
let spec = decoder.spec; // sample_rate, channels
loop
For HLS / cross-codec recreate paths, prefer DecoderFactory::create_from_media_info(reader, &media_info, config) — it skips probing and uses the carried MediaInfo to pick the backend.
Backends
Initialization Paths
- Direct reader creation (
containerspecified): Creates format reader directly without probing. Used for HLS fMP4 where format is known but byte length is unknown. Seek is disabled during init to preventIsoMp4Readerfrom seeking to end. - Probe (
containernot specified): Uses Symphonia's auto-detection. Supportsprobe_no_seekfor ABR variant switches where reported byte length may not match.
Decoder recreate strategy
Seek-time and variant-switch rebuilds go through create_from_media_info,
which skips probing and selects the backend from the carried MediaInfo.
No fallback: if the metadata-driven path fails, the error propagates
verbatim. Probing mid-segment bytes at a mismatched offset can silently
match an unrelated codec (e.g. an MP3 frame sync inside raw AAC-in-fMP4
bytes) and drive the rest of the pipeline off a codec the decoder never
actually decoded — so the recreate path never probes.
Gapless playback
DecoderConfig::gapless is enabled by default. Decoders report engine-level trim
metadata through DecoderTrackInfo::gapless: Option<GaplessInfo>, where
leading_frames and trailing_frames are PCM frame counts.
The contract has one owner for actual trimming:
Some(GaplessInfo)means the backend decoded the untrimmed PCM region and thekithara-audiopipeline must applyGaplessTrimmerbefore effects.Nonemeans no engine trim should run. This covers files with no gapless metadata and backend paths that already applied gapless trim internally.GaplessTrimmer::notify_seek()drops seek-sensitive state — leading trim, pending fade-in, and the buffered tail. Trailing trim still applies at EOF.
When metadata is absent, kithara-audio's AudioConfig::gapless_mode can select
heuristic behaviour via GaplessMode:
GaplessMode::CodecPriming—GaplessTrimmer::codec_priming(frames, sample_rate)is built from a static codec table (codec_priming_frames). AAC LC is 2112, HE-AAC 3072, MP3 LAME-default 1105, Opus 312, and lossless codecs are 0. Predictable and zero-latency.GaplessMode::SilenceTrim(SilenceTrimParams)—GaplessTrimmer::silence_trimwalks the leading buffer until the first sample above a configurable dB threshold and trims everything before it. Optionally trims the trailing silence at EOF too.
See also GaplessMode::Disabled and GaplessMode::MediaOnly on AudioConfig.
Both fallbacks apply a short raised-cosine fade-in (~3 ms) at the trim boundary. The metadata-driven path does not — the boundary lands on a sample-accurate count.
Current metadata sources:
- AAC in MP4/M4A/fMP4: MP4 probe reads
edts/elstfirst, then falls back toiTunSMPB. - MP3, FLAC, Vorbis, and Opus through Symphonia rely on the backend's own
gapless behavior and therefore expose
Nonefor engine trim. - Apple AudioToolbox captures
AudioConverterPrimeInfowhen available. - Android MediaCodec reads
encoder-delay/encoder-paddingfromMediaFormatand falls back to the MP4 probe for AAC MP4 containers.
Feature Flags
When symphonia is disabled (default-features = false + only apple / android), the factory has no software fallback — it errors if the active hardware backend cannot handle a codec/container.
Module layout
src/traits.rs— publicDecodertrait plus typed outcomes (DecoderChunkOutcome,DecoderSeekOutcome,InputReadOutcome) and theDecoderInputsource supertrait.src/factory/—DecoderConfig,DecoderFactory, and theDecoderBackendselector enum. The factory boxes every backend intoBox<dyn Decoder>so callers stay codec-agnostic.src/composed.rs— internalComposedDecoder<D: Demuxer, C: FrameCodec>that implementsDecoderby pairing a demuxer with a frame-level codec.src/demuxer/— internalDemuxertrait and concrete demuxers (fMP4, …).src/fmp4/,src/mp4.rs— fMP4/MP4 container helpers.src/symphonia/(featuresymphonia) — SymphoniaDecoderimplementation; probe and direct paths;ReadSeekAdapter.src/apple/(featureapple, macOS / iOS) — AppleAudioToolboxbackend overAudioFile/AudioConverterFFI.src/android/(featureandroid, Android) —MediaExtractor/MediaCodecbackend over JNI.src/gapless/—GaplessInfo,GaplessMode,GaplessTrimmer,SilenceTrimParams, the encoder-sideprobe_codec_gapless, MP4udta/iTunSMPB/ MPEG-audio Xing/LAME tag parsers, and the trailing fade/silence trim heuristics.src/pcm_time.rs— timeline math (duration_for_frames,frames_for_duration, PTS helpers) shared across backends.src/types.rs,src/error.rs— shared types andDecodeError/ErrorClass.
Cross-decoder protocol test
The cross-decoder protocol test lives in kithara-integration-tests under tests/tests/kithara_decode/. It decodes the same MP3 with every available backend and asserts agreement on spec(), duration(), total frame count, post-seek timestamp, EOF semantics, and (when the apple feature is enabled on macOS/iOS) the full-decode PCM L2 norm within 2 %.
Gapless probe contract
The gapless pipeline splits "where does silence come from?" into two independent layers:
-
Encoder-side priming / padding — silence the encoder added at compress time.
probe_codec_gaplessreads container metadata (iTunSMPB/elstfor AAC LC inside MP4, Xing/Info+LAME for MP3 inside MPEG audio) and returnsSome(GaplessInfo)when it found real values. No fallback chains — when metadata is absent the probe returnsNoneand the pipeline falls back throughGaplessMode::CodecPrimingtoAudioCodec::encoder_priming_frames(libmp3lame default 576, AAC LC MDCT block 1024, Opus RFC pre-skip 312, others 0). -
Decoder-side algorithmic delay — silence the decoder itself emits before it converges. Lives on
FrameCodec::decoder_algo_delayand is per-backend:- Symphonia
mpa(LAME convention): +529 leading, −529 trailing for MP3. Symphonia's own demuxer parses the LAME tag and setstrack.delay = enc_delay + 528 + 1, but the 0.6.0-alpha.1 demuxer does NOT populate per-packettrim_start/trim_end, so the decoder'sopts.gaplessflag is a no-op for MP3 — the caller must apply the trim. - Apple
AudioConverterMP3: +0 (internally compensated; the converter emits exactlyenc_delaysamples of leading silence). - Android
MediaCodec: +0 (no surfaced priming; metadata comes from the demuxer's MP4 udta probe).
- Symphonia
SymphoniaCodec::open_with_config folds its own algo delay into the
probed GaplessInfo before exposing it through track_info(); the
audio pipeline reads one fully-resolved trim and forgets the layered
origin. Decoder::default_priming_frames exposes the same combined
number for the CodecPriming fallback so
kithara_audio::pipeline::gapless::resolve_codec_priming does not
need to know which backend it is talking to.
Empirical justification: raw Symphonia output of a libmp3lame
sawtooth (enc_delay = 576) starts at sample 1105 = 576 + 529; raw
Apple output of the same fixture starts at sample 576. Both backends
ignore the LAME tag entirely — verified by patching enc_delay in the
tag to arbitrary values (0 / 100 / 1152 / 2400); leading silence in
output stayed constant. The probe is the only thing that reads the
tag, then the codec adds (or doesn't add) its own algorithmic delay.
Apple AAC input format (ESDS rationale)
The Apple AudioConverter accepts AAC via a magic cookie whose layout
is the ISO/IEC 14496-1 ES_Descriptor. Demuxers can hand us either
the raw AudioSpecificConfig body (fMP4 / HLS path, first byte =
5-bit AOT << 3, e.g. 0x10–0x17 for AAC LC) or the full ESDS atom
body (AppleAudioFileDemuxer reads kAudioFilePropertyMagicCookieData
which for M4A is already a complete ES_Descriptor; first byte = ESDS
tag 0x03). A single-byte sniff disambiguates without parsing —
build_aac_input_format wraps raw ASC into the minimum ESDS chain
Apple's AudioFormat / AudioConverter APIs accept; full ESDS bodies
go through unchanged.
The ESDS shape we produce mirrors what AudioFileGetProperty(MagicCookieData)
returns for an .m4a file:
ES_Descriptor (tag 0x03):
ES_ID (2 bytes) = 0; Flags (1 byte) = 0
DecoderConfigDescriptor (tag 0x04):
OTI (1 byte) = 0x40 (MPEG-4 Audio)
StreamType (1 byte) = 0x15 (Audio << 2 | reserved bit)
BufferSizeDB (3 bytes) = 0
MaxBitrate (4 bytes) = 0
AvgBitrate (4 bytes) = 0
DecoderSpecificInfo (tag 0x05): <ASC bytes>
SLConfigDescriptor (tag 0x06): predefined (1 byte) = 0x02
After cookie installation we ask AudioFormatGetProperty(FormatList)
to derive the canonical ASBD for the first format item — that is the
authoritative mSampleRate / mChannelsPerFrame /
mFramesPerPacket, not whatever the demuxer reported in TrackInfo
(HE-AAC v2 doubles the rate vs the container declaration; FormatList
returns the upsampled rate).
Integration
Consumed by kithara-audio which wraps it in a threaded pipeline with effects and resampling. Accepts any R: Read + Seek + Send + Sync + 'static -- works with Stream<File>, Stream<Hls>, Cursor<Vec<u8>>, or plain files.