lvqr_agent_whisper/lib.rs
1//! whisper.cpp captions agent for LVQR.
2//!
3//! **Tier 4 item 4.5, session B.** First concrete `Agent` impl
4//! that drops into the session-97-A `lvqr-agent` scaffold: a
5//! `WhisperCaptionsAgent` that subscribes to a broadcast's
6//! audio track, decodes raw AAC frames out of each fragment's
7//! `moof + mdat` payload via symphonia, buffers ~5 s of PCM,
8//! and runs whisper.cpp inference on a worker thread to emit
9//! `TranscribedCaption` values for downstream consumption
10//! (session 99 C wires those into the LL-HLS subtitle rendition
11//! group).
12//!
13//! # Crate shape
14//!
15//! Two surfaces, gated by the `whisper` Cargo feature:
16//!
17//! * **Always available** (`cargo build` without features):
18//! `WhisperCaptionsFactory`, `WhisperCaptionsAgent`,
19//! [`TranscribedCaption`], plus the always-pure helpers in
20//! [`mdat`] and [`asc`]. Without the feature the agent's
21//! `on_fragment` is a structured-tracing no-op so the trait
22//! contract still holds and `cargo test --workspace` builds
23//! the crate without paying the whisper.cpp build cost.
24//! * **Feature `whisper`** (`cargo build --features whisper`):
25//! pulls in `whisper-rs 0.16` (bindgen + cmake against
26//! whisper.cpp) and `symphonia 0.6.0-alpha.2` (pure-Rust
27//! AAC-LC decoder). The agent's `on_fragment` then forwards
28//! each AAC frame to a worker thread that decodes via
29//! symphonia, buffers PCM, and runs `WhisperContext::full`
30//! on the buffered window.
31//!
32//! `cargo test --workspace` deliberately runs the no-feature
33//! variant so the workspace gate stays fast and CI runners
34//! without Xcode CLT / cmake / libclang do not have to compile
35//! whisper.cpp on every push. To exercise the inference path:
36//!
37//! ```bash
38//! WHISPER_MODEL_PATH=/path/to/ggml-tiny.en.bin \
39//! cargo test -p lvqr-agent-whisper --features whisper -- --ignored
40//! ```
41//!
42//! # Lifecycle
43//!
44//! * `WhisperCaptionsFactory::build` returns `Some(agent)` only
45//! for tracks named `"1.mp4"` (the LVQR audio-track convention)
46//! and `None` for every other track. Video / catalog / future
47//! captions tracks see no agent spawn.
48//! * `WhisperCaptionsAgent::on_start` (with `whisper` feature)
49//! spawns one OS worker thread that owns the
50//! `WhisperContext`, the symphonia decoder, and the PCM
51//! ring. The agent itself is a thin handle holding a bounded
52//! `tokio::sync::mpsc::Sender<WorkerMessage>` and the
53//! captions-output `tokio::sync::broadcast::Sender`.
54//! * `WhisperCaptionsAgent::on_fragment` extracts the raw AAC
55//! frame bytes from the fragment's `moof + mdat` payload via
56//! [`mdat::extract_first_mdat`] and pushes them down the
57//! worker channel. The drain task is never blocked: a full
58//! channel logs `warn!` and drops the frame (caption gaps
59//! beat per-broadcast back-pressure).
60//! * `WhisperCaptionsAgent::on_stop` closes the worker channel.
61//! The worker thread drains its remaining PCM, runs one final
62//! inference pass, then exits.
63//!
64//! The agent is registered against an existing
65//! `lvqr_agent::AgentRunner`; session 100 D will thread the
66//! factory through `lvqr_cli::start` behind a `--whisper-model
67//! <path>` CLI flag. This session leaves the CLI untouched.
68//!
69//! # Anti-scope (session 98 B)
70//!
71//! * **No CLI wiring.** Session 100 D does that.
72//! * **No HLS subtitle rendition.** Session 99 C wires
73//! `TranscribedCaption` into the `lvqr-hls` MultiHlsServer's
74//! subtitle rendition group; this session leaves the captions
75//! on a public `tokio::sync::broadcast::Receiver` that
76//! session 99 subscribes to.
77//! * **No multi-language tuning.** English only. The factory
78//! accepts a `WhisperConfig` so language can be plumbed in
79//! later, but the inference path always uses English in this
80//! session (per section 4.5 anti-scope).
81//! * **No GPU acceleration.** whisper-rs ships `metal` /
82//! `cuda` / `coreml` features; LVQR keeps them off so the
83//! default CI build stays portable. Operators with GPUs
84//! enable them via their own `lvqr-cli` build with the
85//! appropriate whisper-rs feature pinned.
86
87pub mod asc;
88pub mod caption;
89pub mod factory;
90pub mod mdat;
91
92mod agent;
93
94#[cfg(feature = "whisper")]
95mod decode;
96#[cfg(feature = "whisper")]
97mod worker;
98
99pub use agent::WhisperCaptionsAgent;
100pub use caption::{CaptionStream, TranscribedCaption};
101pub use factory::{WhisperCaptionsFactory, WhisperConfig};