Expand description
§adk-audio
Audio intelligence and pipeline orchestration for ADK-Rust agents.
Provides unified traits for Text-to-Speech (TTS), Speech-to-Text (STT), music generation, audio FX/DSP processing, and Voice Activity Detection (VAD), with a composable pipeline system for building voice agent loops, podcast production, transcription, and generative soundscapes.
§Features
tts(default) — Cloud TTS providers (ElevenLabs, OpenAI, Gemini, Cartesia)stt(default) — Cloud STT providers (Whisper API, Deepgram, AssemblyAI, Gemini)music— Music generation providersfx— DSP processors (normalizer, resampler, noise, compressor)vad— Voice Activity Detectionmlx— Local inference model loading (tokenizers + HF Hub, cross-platform)onnx— ONNX Runtime local inference (cross-platform)livekit— adk-realtime bridgeall— All features (safe for any platform and CI)
§Quick Start
ⓘ
use adk_audio::{AudioPipelineBuilder, AudioFrame};
let handle = AudioPipelineBuilder::new()
.tts(my_tts_provider)
.build_tts()?;Re-exports§
pub use codec::AudioFormat;pub use codec::decode;pub use codec::encode;pub use error::AudioError;pub use error::AudioResult;pub use frame::AudioFrame;pub use frame::merge_frames;pub use mixer::Mixer;pub use pipeline::AudioPipelineBuilder;pub use pipeline::PipelineControl;pub use pipeline::PipelineHandle;pub use pipeline::PipelineInput;pub use pipeline::PipelineMetrics;pub use pipeline::PipelineOutput;pub use pipeline::SentenceChunker;pub use tools::ApplyFxTool;pub use tools::GenerateMusicTool;pub use tools::SpeakTool;pub use tools::TranscribeTool;pub use traits::AudioProcessor;pub use traits::Emotion;pub use traits::FxChain;pub use traits::MusicProvider;pub use traits::MusicRequest;pub use traits::Speaker;pub use traits::SpeechSegment;pub use traits::SttOptions;pub use traits::SttProvider;pub use traits::Transcript;pub use traits::TtsProvider;pub use traits::TtsRequest;pub use traits::VadProcessor;pub use traits::Voice;pub use traits::Word;pub use providers::tts::CartesiaTts;pub use providers::tts::CloudTtsConfig;pub use providers::tts::ElevenLabsTts;pub use providers::tts::GeminiTts;pub use providers::tts::OpenAiTts;pub use providers::tts::SpeakerConfig;pub use providers::stt::AssemblyAiStt;pub use providers::stt::DeepgramStt;pub use providers::stt::GeminiStt;pub use providers::stt::WhisperApiStt;pub use registry::LocalModelRegistry;
Modules§
- codec
- Audio codec conversion between PCM16 internal format and external formats.
- error
- Error types for the adk-audio crate.
- frame
- Canonical audio buffer type used throughout the crate.
- mixer
- Multi-track audio mixer with per-track volume control.
- pipeline
- Composable audio pipeline system.
- providers
- Cloud and native provider implementations.
- registry
- Local model registry for downloading and caching model weights.
- tools
- Audio tools for LlmAgent integration.
- traits
- Core provider and processor traits.