Crate atomr_agents_stt_core

Expand description

Core types for the atomr-agents speech-to-text capability.

This crate is intentionally I/O-free: it defines the SpeechToText and StreamingSession traits, the rich Capabilities struct that backends advertise via a pub const, the audio-input and transcript data types, and a deterministic MockSpeechToText for tests.

Concrete backends live in sibling crates:

atomr-agents-stt-runtime-openai — OpenAI Whisper REST.
atomr-agents-stt-runtime-deepgram — Deepgram REST + WS.
atomr-agents-stt-runtime-assemblyai — AssemblyAI REST + WS.
atomr-agents-stt-runtime-whisper — local whisper-rs.

Audio I/O (symphonia, cpal) lives in atomr-agents-stt-audio, the higher-level voice-session abstraction in atomr-agents-stt-voice, and the agent-framework adapters in atomr-agents-stt-tool.

Structs§

Capabilities: Serialize-only by design: the slice fields (languages and supported_audio_formats) are &'static, which serde can’t deserialize into. Capabilities flow outward (to JSON / Python / telemetry) and are never round-tripped back into Rust.
MockSpeechToText: Deterministic mock STT. transcribe returns a transcript whose text is a hash digest of the input length so tests can assert on stability without caring about the exact string.
PcmBuffer: Decoded PCM. Backends that resample (e.g. whisper-rs needs 16 kHz mono f32) take this and convert.
Segment
SpeakerTag
StreamOptions: Per-call options for opening a streaming session. Mirrors the “common knobs” across the four MVP backends.
TranscribeOptions: Per-call options shared across batch transcription backends.
Transcript
Word

Enums§

AudioFormat
AudioInput
BackendKind
DiarizationSupport
Languages
SampleType
StreamEvent
SttError
TransportKind

Traits§

SpeechToText: Speech-to-text backend. Implementations live in sibling stt-runtime-* crates.
StreamingSession: Active streaming session. Caller alternates push_audio/finish with consuming the stream returned from events.

Type Aliases§

DynSpeechToText
Result