autoagents-speech 0.3.5

Speech (TTS/STT) provider abstractions for AutoAgents
Documentation

AutoAgents Speech

Speech (TTS/STT) provider abstractions for the AutoAgents framework.

This crate provides trait-based abstraction layers for speech providers, allowing different backends to be used interchangeably within the AutoAgents ecosystem.

Features

TTS (Text-to-Speech)

  • Speech Generation: Generate audio from text
  • Voice Management: Use predefined voices
  • Streaming Support: Optional streaming for real-time audio generation
  • Model Management: Support for multiple models and languages

STT (Speech-to-Text)

  • Transcription: Convert audio to text
  • Streaming Support: Real-time audio transcription
  • Timestamp Support: Token-level timestamps for transcriptions
  • Multilingual: Support for multiple languages with auto-detection

Architecture

The crate follows a trait-based design with provider implementations in the providers module:

TTS Traits

  • TTSProvider: Marker trait combining all TTS capabilities
  • TTSSpeechProvider: Speech generation capabilities
  • TTSModelsProvider: Model and language support

STT Traits

  • STTProvider: Marker trait combining all STT capabilities
  • STTSpeechProvider: Transcription capabilities
  • STTModelsProvider: Model and language support

Providers

Enable providers using feature flags:

  • pocket-tts: Pocket-TTS model support (TTS)
  • parakeet: Parakeet (NVIDIA) model support (STT)
  • vad: Silero VAD support (speech segmentation)