Expand description
Transcriber trait and event types per issues #799 and #801.
Transcriber is the contract every speech-to-text backend implements,
whether batch (this issue) or streaming (#806). It takes an
AudioInput producing 16 kHz mono signed-PCM chunks and returns an
EventStream of TranscriptEvents.
The separation from crate::voice::AudioSource (#800) is deliberate:
AudioSource is the hardware-capture seam (variable rate, variable
channels, f32, intentionally !Send on macOS per ADR-0031);
AudioInput is the post-mixdown-and-resample seam (16 kHz mono i16,
Send) that ASR engines consume natively. See ADR-0032 for the rationale.
Structs§
- VecAudio
Input - In-memory
AudioInputadapter — reads a 16 kHz mono 16-bit PCM WAV from disk (or accepts an in-memoryVec<i16>) and yields it in fixed- size chunks. - Word
- First-class word-level alignment, optionally returned by backends that
expose it. The batch backend in #801 always emits
None; word-level alignment is a backend opt-in, not a guarantee.
Enums§
- Endpoint
Kind - What ended a speech region.
- Transcript
Event - One event emitted by a
Transcriber.
Traits§
- Audio
Input - Source of 16 kHz mono signed-PCM audio for transcription.
- Event
Stream - Stream of transcription events. A blanket impl is provided for any
iterator producing
Result<TranscriptEvent>that is alsoSend. - Transcriber
- Speech-to-text backend.
Type Aliases§
- Audio
Chunk - 16 kHz mono signed 16-bit PCM samples, in capture order.
- EventId
- Monotonically-unique identifier for a
Finalevent, used by downstream consumers (commit-message generation, history merging) to deduplicate across overlapping streaming windows. - Speaker
Id - Diarisation tag attached to a segment when speaker labelling is on
(#805). Always
Nonefor the batch backend in #801.