Skip to main content

autoagents_speech/
lib.rs

1//! # AutoAgents Speech
2//!
3//! Speech (TTS/STT) provider abstractions for the AutoAgents framework.
4//!
5//! This crate provides trait-based abstraction layers for speech providers, allowing
6//! different backends to be used interchangeably within the AutoAgents ecosystem.
7//!
8//! ## Features
9//!
10//! ### TTS (Text-to-Speech)
11//! - **Speech Generation**: Generate audio from text
12//! - **Voice Management**: Use predefined voices
13//! - **Streaming Support**: Optional streaming for real-time audio generation
14//! - **Model Management**: Support for multiple models and languages
15//!
16//! ### STT (Speech-to-Text)
17//! - **Transcription**: Convert audio to text
18//! - **Streaming Support**: Real-time audio transcription
19//! - **Timestamp Support**: Token-level timestamps for transcriptions
20//! - **Multilingual**: Support for multiple languages with auto-detection
21//!
22//! ## Architecture
23//!
24//! The crate follows a trait-based design with provider implementations in the `providers` module:
25//!
26//! ### TTS Traits
27//! - `TTSProvider`: Marker trait combining all TTS capabilities
28//! - `TTSSpeechProvider`: Speech generation capabilities
29//! - `TTSModelsProvider`: Model and language support
30//!
31//! ### STT Traits
32//! - `STTProvider`: Marker trait combining all STT capabilities
33//! - `STTSpeechProvider`: Transcription capabilities
34//! - `STTModelsProvider`: Model and language support
35//!
36//! ## Providers
37//!
38//! Enable providers using feature flags:
39//! - `pocket-tts`: Pocket-TTS model support (TTS)
40//! - `parakeet`: Parakeet (NVIDIA) model support (STT)
41//! - `vad`: Silero VAD support (speech segmentation)
42//!
43
44pub mod error;
45pub mod model_source;
46mod provider;
47pub mod types;
48
49// Provider implementations
50pub mod providers;
51
52// Re-export main TTS types
53pub use error::{TTSError, TTSResult};
54pub use provider::{TTSModelsProvider, TTSProvider, TTSSpeechProvider};
55pub use types::{
56    AudioChunk, AudioData, AudioFormat, ModelInfo, SharedAudioData, SpeechRequest, SpeechResponse,
57    VoiceIdentifier,
58};
59
60// Re-export main STT types
61pub use error::{STTError, STTResult};
62pub use model_source::ModelSource;
63pub use provider::{STTModelsProvider, STTProvider, STTSpeechProvider};
64pub use types::{TextChunk, TokenTimestamp, TranscriptionRequest, TranscriptionResponse};
65
66#[cfg(feature = "playback")]
67pub mod playback;
68
69#[cfg(feature = "audio-capture")]
70pub mod audio_capture;
71
72#[cfg(feature = "vad")]
73pub mod vad;