Skip to main content

autoagents_speech/
lib.rs

1//! # AutoAgents Speech
2//!
3//! Speech (TTS/STT) provider abstractions for the AutoAgents framework.
4//!
5//! This crate provides trait-based abstraction layers for speech providers, allowing
6//! different backends to be used interchangeably within the AutoAgents ecosystem.
7//!
8//! ## Features
9//!
10//! ### TTS (Text-to-Speech)
11//! - **Speech Generation**: Generate audio from text
12//! - **Voice Management**: Use predefined voices
13//! - **Streaming Support**: Optional streaming for real-time audio generation
14//! - **Model Management**: Support for multiple models and languages
15//!
16//! ### STT (Speech-to-Text)
17//! - **Transcription**: Convert audio to text
18//! - **Streaming Support**: Real-time audio transcription
19//! - **Timestamp Support**: Token-level timestamps for transcriptions
20//! - **Multilingual**: Support for multiple languages with auto-detection
21//!
22//! ## Architecture
23//!
24//! The crate follows a trait-based design with provider implementations in the `providers` module:
25//!
26//! ### TTS Traits
27//! - `TTSProvider`: Marker trait combining all TTS capabilities
28//! - `TTSSpeechProvider`: Speech generation capabilities
29//! - `TTSModelsProvider`: Model and language support
30//!
31//! ### STT Traits
32//! - `STTProvider`: Marker trait combining all STT capabilities
33//! - `STTSpeechProvider`: Transcription capabilities
34//! - `STTModelsProvider`: Model and language support
35//!
36//! ## Providers
37//!
38//! Enable providers using feature flags:
39//! - `pocket-tts`: Pocket-TTS model support (TTS)
40//! - `parakeet`: Parakeet (NVIDIA) model support (STT)
41//! - `vad`: Silero VAD support (speech segmentation)
42//!
43
44pub mod error;
45pub mod model_source;
46mod provider;
47pub mod types;
48
49// Provider implementations
50pub mod providers;
51
52// TTS utilities (sentence chunking, streaming pipeline)
53pub mod tts;
54
55// Re-export main TTS types
56pub use error::{TTSError, TTSResult};
57pub use provider::{TTSModelsProvider, TTSProvider, TTSSpeechProvider};
58pub use tts::{ChunkerConfig, SentenceChunker, StreamingTtsPipeline};
59pub use types::{
60    AudioChunk, AudioData, AudioFormat, ModelInfo, SharedAudioData, SpeechRequest, SpeechResponse,
61    VoiceIdentifier,
62};
63
64// Re-export main STT types
65pub use error::{STTError, STTResult};
66pub use model_source::ModelSource;
67pub use provider::{STTModelsProvider, STTProvider, STTSpeechProvider};
68pub use types::{TextChunk, TokenTimestamp, TranscriptionRequest, TranscriptionResponse};
69
70#[cfg(feature = "playback")]
71pub mod playback;
72
73#[cfg(feature = "audio-capture")]
74pub mod audio_capture;
75
76#[cfg(feature = "vad")]
77pub mod vad;