Expand description
whisper.cpp captions agent for LVQR.
Tier 4 item 4.5, session B. First concrete Agent impl
that drops into the session-97-A lvqr-agent scaffold: a
WhisperCaptionsAgent that subscribes to a broadcast’s
audio track, decodes raw AAC frames out of each fragment’s
moof + mdat payload via symphonia, buffers ~5 s of PCM,
and runs whisper.cpp inference on a worker thread to emit
TranscribedCaption values for downstream consumption
(session 99 C wires those into the LL-HLS subtitle rendition
group).
§Crate shape
Two surfaces, gated by the whisper Cargo feature:
- Always available (
cargo buildwithout features):WhisperCaptionsFactory,WhisperCaptionsAgent,TranscribedCaption, plus the always-pure helpers inmdatandasc. Without the feature the agent’son_fragmentis a structured-tracing no-op so the trait contract still holds andcargo test --workspacebuilds the crate without paying the whisper.cpp build cost. - Feature
whisper(cargo build --features whisper): pulls inwhisper-rs 0.16(bindgen + cmake against whisper.cpp) andsymphonia 0.6.0-alpha.2(pure-Rust AAC-LC decoder). The agent’son_fragmentthen forwards each AAC frame to a worker thread that decodes via symphonia, buffers PCM, and runsWhisperContext::fullon the buffered window.
cargo test --workspace deliberately runs the no-feature
variant so the workspace gate stays fast and CI runners
without Xcode CLT / cmake / libclang do not have to compile
whisper.cpp on every push. To exercise the inference path:
WHISPER_MODEL_PATH=/path/to/ggml-tiny.en.bin \
cargo test -p lvqr-agent-whisper --features whisper -- --ignored§Lifecycle
WhisperCaptionsFactory::buildreturnsSome(agent)only for tracks named"1.mp4"(the LVQR audio-track convention) andNonefor every other track. Video / catalog / future captions tracks see no agent spawn.WhisperCaptionsAgent::on_start(withwhisperfeature) spawns one OS worker thread that owns theWhisperContext, the symphonia decoder, and the PCM ring. The agent itself is a thin handle holding a boundedtokio::sync::mpsc::Sender<WorkerMessage>and the captions-outputtokio::sync::broadcast::Sender.WhisperCaptionsAgent::on_fragmentextracts the raw AAC frame bytes from the fragment’smoof + mdatpayload viamdat::extract_first_mdatand pushes them down the worker channel. The drain task is never blocked: a full channel logswarn!and drops the frame (caption gaps beat per-broadcast back-pressure).WhisperCaptionsAgent::on_stopcloses the worker channel. The worker thread drains its remaining PCM, runs one final inference pass, then exits.
The agent is registered against an existing
lvqr_agent::AgentRunner; session 100 D will thread the
factory through lvqr_cli::start behind a --whisper-model <path> CLI flag. This session leaves the CLI untouched.
§Anti-scope (session 98 B)
- No CLI wiring. Session 100 D does that.
- No HLS subtitle rendition. Session 99 C wires
TranscribedCaptioninto thelvqr-hlsMultiHlsServer’s subtitle rendition group; this session leaves the captions on a publictokio::sync::broadcast::Receiverthat session 99 subscribes to. - No multi-language tuning. English only. The factory
accepts a
WhisperConfigso language can be plumbed in later, but the inference path always uses English in this session (per section 4.5 anti-scope). - No GPU acceleration. whisper-rs ships
metal/cuda/coremlfeatures; LVQR keeps them off so the default CI build stays portable. Operators with GPUs enable them via their ownlvqr-clibuild with the appropriate whisper-rs feature pinned.
Re-exports§
pub use caption::CaptionStream;pub use caption::TranscribedCaption;pub use factory::WhisperCaptionsFactory;pub use factory::WhisperConfig;
Modules§
- asc
- Extract the AAC
AudioSpecificConfig(ASC) bytes from a CMAF audio init segment. - caption
TranscribedCaption+CaptionStream.- factory
WhisperCaptionsFactory+WhisperConfig.- mdat
- Minimal
moof + mdatparser: extract the firstmdatpayload bytes from a CMAF audio fragment.
Structs§
- Whisper
Captions Agent - Per-broadcast captions agent. One instance per
(broadcast, "1.mp4")pair built bycrate::WhisperCaptionsFactory. Each agent runs on its own drain task on the tokio runtime thatlvqr_agent::AgentRunner::installwas called from.