Skip to main content

Module transcriber

Module transcriber 

Source
Expand description

Transcriber trait and event types per issues #799 and #801.

Transcriber is the contract every speech-to-text backend implements, whether batch (this issue) or streaming (#806). It takes an AudioInput producing 16 kHz mono signed-PCM chunks and returns an EventStream of TranscriptEvents.

The separation from crate::voice::AudioSource (#800) is deliberate: AudioSource is the hardware-capture seam (variable rate, variable channels, f32, intentionally !Send on macOS per ADR-0031); AudioInput is the post-mixdown-and-resample seam (16 kHz mono i16, Send) that ASR engines consume natively. See ADR-0032 for the rationale.

Structs§

VecAudioInput
In-memory AudioInput adapter — reads a 16 kHz mono 16-bit PCM WAV from disk (or accepts an in-memory Vec<i16>) and yields it in fixed- size chunks.
Word
First-class word-level alignment, optionally returned by backends that expose it. The batch backend in #801 always emits None; word-level alignment is a backend opt-in, not a guarantee.

Enums§

EndpointKind
What ended a speech region.
TranscriptEvent
One event emitted by a Transcriber.

Traits§

AudioInput
Source of 16 kHz mono signed-PCM audio for transcription.
EventStream
Stream of transcription events. A blanket impl is provided for any iterator producing Result<TranscriptEvent> that is also Send.
Transcriber
Speech-to-text backend.

Type Aliases§

AudioChunk
16 kHz mono signed 16-bit PCM samples, in capture order.
EventId
Monotonically-unique identifier for a Final event, used by downstream consumers (commit-message generation, history merging) to deduplicate across overlapping streaming windows.
SpeakerId
Diarisation tag attached to a segment when speaker labelling is on (#805). Always None for the batch backend in #801.