Available on crate feature
audio only.Expand description
Audio processing pipeline (experimental — labs preset).
Provides audio capabilities for agents:
TtsProvider- Text-to-speech synthesisSttProvider- Speech-to-text transcriptionAudioProcessor- Audio effects processingAudioPipeline- Composable audio pipelines- Cloud providers: ElevenLabs, OpenAI, Gemini, Cartesia, Deepgram, AssemblyAI
- Local inference: MLX (Apple Silicon), ONNX Runtime
Available with feature: audio
Modules§
- codec
- Audio codec conversion between PCM16 internal format and external formats.
- error
- Error types for the adk-audio crate.
- frame
- Canonical audio buffer type used throughout the crate.
- mixer
- Multi-track audio mixer with per-track volume control.
- pipeline
- Composable audio pipeline system.
- providers
- Cloud and native provider implementations.
- registry
- Local model registry for downloading and caching model weights.
- tools
- Audio tools for LlmAgent integration.
- traits
- Core provider and processor traits.
Structs§
- Apply
FxTool - Tool that applies a named FX chain to audio data.
- Assembly
AiStt - AssemblyAI Universal STT provider.
- Audio
Frame - The canonical audio buffer — raw PCM-16 LE samples with metadata.
- Audio
Pipeline Builder - Builder for constructing audio pipelines.
- Cartesia
Tts - Cartesia Sonic TTS provider.
- Cloud
TtsConfig - Shared configuration for cloud TTS providers.
- Deepgram
Stt - Deepgram Nova STT provider.
- Eleven
Labs Tts - ElevenLabs TTS provider.
- FxChain
- An ordered chain of
AudioProcessorstages applied in series. - Gemini
Stt - Gemini STT provider using
generateContentwith audio input. - Gemini
Tts - Gemini TTS provider using
generateContentwith audio response modality. - Generate
Music Tool - Tool that generates music from a text prompt.
- Local
Model Registry - Registry for managing local model downloads and caching.
- Mixer
- Multi-track audio mixer.
- Music
Request - Request parameters for music generation.
- Open
AiTts - OpenAI TTS provider using the
/v1/audio/speechendpoint. - Pipeline
Handle - Handle to a running audio pipeline.
- Pipeline
Metrics - Real-time latency and quality metrics from pipeline stages.
- Sentence
Chunker - Buffers LLM tokens and emits complete sentences at delimiter boundaries.
- Speak
Tool - Tool that synthesizes text to speech audio.
- Speaker
- An identified speaker.
- Speaker
Config - Speaker configuration for multi-speaker TTS.
- Speech
Segment - A detected speech segment within an audio frame.
- SttOptions
- Options for speech-to-text transcription.
- Transcribe
Tool - Tool that transcribes audio to text.
- Transcript
- A transcription result.
- TtsRequest
- Request parameters for TTS synthesis.
- Voice
- Descriptor for an available voice.
- Whisper
ApiStt - OpenAI Whisper API STT provider.
- Word
- A single word with timing and confidence.
Enums§
- Audio
Error - Errors produced by audio subsystems.
- Audio
Format - Supported audio formats for encode/decode at transport edges.
- Emotion
- Emotion hint for TTS synthesis.
- Pipeline
Control - Pipeline control commands.
- Pipeline
Input - Messages that can be sent into a pipeline.
- Pipeline
Output - Messages produced by a pipeline.
Traits§
- Audio
Processor - Trait for stateless or stateful DSP transforms on audio frames.
- Music
Provider - Unified trait for music generation providers.
- SttProvider
- Unified trait for speech-to-text providers.
- TtsProvider
- Unified trait for text-to-speech providers.
- VadProcessor
- Trait for Voice Activity Detection processors.
Functions§
- decode
- Decode encoded bytes into a PCM16
AudioFrame. - encode
- Encode an
AudioFrameto the target format. - merge_
frames - Merge multiple
AudioFramevalues into a single contiguous frame.
Type Aliases§
- Audio
Result - Convenience result type for audio operations.