Expand description
Core types for the atomr-agents speech-to-text capability.
This crate is intentionally I/O-free: it defines the
SpeechToText and StreamingSession traits, the rich
Capabilities struct that backends advertise via a
pub const, the audio-input and transcript data types, and a
deterministic MockSpeechToText for tests.
Concrete backends live in sibling crates:
atomr-agents-stt-runtime-openai— OpenAI Whisper REST.atomr-agents-stt-runtime-deepgram— Deepgram REST + WS.atomr-agents-stt-runtime-assemblyai— AssemblyAI REST + WS.atomr-agents-stt-runtime-whisper— local whisper-rs.
Audio I/O (symphonia, cpal) lives in atomr-agents-stt-audio,
the higher-level voice-session abstraction in
atomr-agents-stt-voice, and the agent-framework adapters in
atomr-agents-stt-tool.
Structs§
- Capabilities
Serialize-only by design: the slice fields (languagesandsupported_audio_formats) are&'static, whichserdecan’t deserialize into. Capabilities flow outward (to JSON / Python / telemetry) and are never round-tripped back into Rust.- Mock
Speech ToText - Deterministic mock STT.
transcribereturns a transcript whose text is a hash digest of the input length so tests can assert on stability without caring about the exact string. - PcmBuffer
- Decoded PCM. Backends that resample (e.g. whisper-rs needs 16 kHz mono f32) take this and convert.
- Segment
- Speaker
Tag - Stream
Options - Per-call options for opening a streaming session. Mirrors the “common knobs” across the four MVP backends.
- Transcribe
Options - Per-call options shared across batch transcription backends.
- Transcript
- Word
Enums§
- Audio
Format - Audio
Input - Backend
Kind - Diarization
Support - Languages
- Sample
Type - Stream
Event - SttError
- Transport
Kind
Traits§
- Speech
ToText - Speech-to-text backend. Implementations live in sibling
stt-runtime-*crates. - Streaming
Session - Active streaming session. Caller alternates
push_audio/finishwith consuming the stream returned fromevents.