AutoAgents Speech

Speech (TTS/STT) provider abstractions for the AutoAgents framework.

This crate provides trait-based abstraction layers for speech providers, allowing different backends to be used interchangeably within the AutoAgents ecosystem.

Features

TTS (Text-to-Speech)

Speech Generation: Generate audio from text
Voice Management: Use predefined voices
Streaming Support: Optional streaming for real-time audio generation
Model Management: Support for multiple models and languages

STT (Speech-to-Text)

Transcription: Convert audio to text
Streaming Support: Real-time audio transcription
Timestamp Support: Token-level timestamps for transcriptions
Multilingual: Support for multiple languages with auto-detection

Architecture

The crate follows a trait-based design with provider implementations in the providers module:

TTS Traits

TTSProvider: Marker trait combining all TTS capabilities
TTSSpeechProvider: Speech generation capabilities
TTSModelsProvider: Model and language support

STT Traits

STTProvider: Marker trait combining all STT capabilities
STTSpeechProvider: Transcription capabilities
STTModelsProvider: Model and language support

Providers

Enable providers using feature flags:

pocket-tts: Pocket-TTS model support (TTS)
parakeet: Parakeet (NVIDIA) model support (STT)
vad: Silero VAD support (speech segmentation)

autoagents-speech 0.3.5