Module transcriber

Expand description

Transcriber trait and event types per issues #799 and #801.

Transcriber is the contract every speech-to-text backend implements, whether batch (this issue) or streaming (#806). It takes an AudioInput producing 16 kHz mono signed-PCM chunks and returns an EventStream of TranscriptEvents.

The separation from crate::voice::AudioSource (#800) is deliberate: AudioSource is the hardware-capture seam (variable rate, variable channels, f32, intentionally !Send on macOS per ADR-0031); AudioInput is the post-mixdown-and-resample seam (16 kHz mono i16, Send) that ASR engines consume natively. See ADR-0032 for the rationale.

Structs§

VecAudioInput: In-memory AudioInput adapter — reads a 16 kHz mono 16-bit PCM WAV from disk (or accepts an in-memory Vec<i16>) and yields it in fixed- size chunks.
Word: First-class word-level alignment, optionally returned by backends that expose it. The batch backend in #801 always emits None; word-level alignment is a backend opt-in, not a guarantee.

Enums§

EndpointKind: What ended a speech region.
TranscriptEvent: One event emitted by a Transcriber.

Traits§

AudioInput: Source of 16 kHz mono signed-PCM audio for transcription.
EventStream: Stream of transcription events. A blanket impl is provided for any iterator producing Result<TranscriptEvent> that is also Send.
Transcriber: Speech-to-text backend.

Type Aliases§

AudioChunk: 16 kHz mono signed 16-bit PCM samples, in capture order.
EventId: Monotonically-unique identifier for a Final event, used by downstream consumers (commit-message generation, history merging) to deduplicate across overlapping streaming windows.
SpeakerId: Diarisation tag attached to a segment when speaker labelling is on (#805). Always None for the batch backend in #801.