Skip to main content

Crate adk_audio

Crate adk_audio 

Source
Expand description

§adk-audio

Audio intelligence and pipeline orchestration for ADK-Rust agents.

Provides unified traits for Text-to-Speech (TTS), Speech-to-Text (STT), music generation, audio FX/DSP processing, and Voice Activity Detection (VAD), with a composable pipeline system for building voice agent loops, podcast production, transcription, and generative soundscapes.

§Features

  • tts (default) — Cloud TTS providers (ElevenLabs, OpenAI, Gemini, Cartesia)
  • stt (default) — Cloud STT providers (Whisper API, Deepgram, AssemblyAI, Gemini)
  • music — Music generation providers
  • fx — DSP processors (normalizer, resampler, noise, compressor)
  • vad — Voice Activity Detection
  • mlx — Local inference model loading (tokenizers + HF Hub, cross-platform)
  • onnx — ONNX Runtime local inference (cross-platform)
  • livekit — adk-realtime bridge
  • all — All features (safe for any platform and CI)

§Quick Start

use adk_audio::{AudioPipelineBuilder, AudioFrame};

let handle = AudioPipelineBuilder::new()
    .tts(my_tts_provider)
    .build_tts()?;

Re-exports§

pub use codec::AudioFormat;
pub use codec::decode;
pub use codec::encode;
pub use error::AudioError;
pub use error::AudioResult;
pub use frame::AudioFrame;
pub use frame::merge_frames;
pub use mixer::Mixer;
pub use pipeline::AudioPipelineBuilder;
pub use pipeline::PipelineControl;
pub use pipeline::PipelineHandle;
pub use pipeline::PipelineInput;
pub use pipeline::PipelineMetrics;
pub use pipeline::PipelineOutput;
pub use pipeline::SentenceChunker;
pub use tools::ApplyFxTool;
pub use tools::GenerateMusicTool;
pub use tools::SpeakTool;
pub use tools::TranscribeTool;
pub use traits::AudioProcessor;
pub use traits::Emotion;
pub use traits::FxChain;
pub use traits::MusicProvider;
pub use traits::MusicRequest;
pub use traits::Speaker;
pub use traits::SpeechSegment;
pub use traits::SttOptions;
pub use traits::SttProvider;
pub use traits::Transcript;
pub use traits::TtsProvider;
pub use traits::TtsRequest;
pub use traits::VadProcessor;
pub use traits::Voice;
pub use traits::Word;
pub use providers::tts::CartesiaTts;
pub use providers::tts::CloudTtsConfig;
pub use providers::tts::ElevenLabsTts;
pub use providers::tts::GeminiTts;
pub use providers::tts::OpenAiTts;
pub use providers::tts::SpeakerConfig;
pub use providers::stt::AssemblyAiStt;
pub use providers::stt::DeepgramStt;
pub use providers::stt::GeminiStt;
pub use providers::stt::WhisperApiStt;
pub use registry::LocalModelRegistry;

Modules§

codec
Audio codec conversion between PCM16 internal format and external formats.
error
Error types for the adk-audio crate.
frame
Canonical audio buffer type used throughout the crate.
mixer
Multi-track audio mixer with per-track volume control.
pipeline
Composable audio pipeline system.
providers
Cloud and native provider implementations.
registry
Local model registry for downloading and caching model weights.
tools
Audio tools for LlmAgent integration.
traits
Core provider and processor traits.