Skip to main content

Module audio

Module audio 

Source
Available on crate feature audio only.
Expand description

Audio processing pipeline (experimental — labs preset).

Provides audio capabilities for agents:

  • TtsProvider - Text-to-speech synthesis
  • SttProvider - Speech-to-text transcription
  • AudioProcessor - Audio effects processing
  • AudioPipeline - Composable audio pipelines
  • Cloud providers: ElevenLabs, OpenAI, Gemini, Cartesia, Deepgram, AssemblyAI
  • Local inference: MLX (Apple Silicon), ONNX Runtime

Available with feature: audio

Modules§

codec
Audio codec conversion between PCM16 internal format and external formats.
error
Error types for the adk-audio crate.
frame
Canonical audio buffer type used throughout the crate.
mixer
Multi-track audio mixer with per-track volume control.
pipeline
Composable audio pipeline system.
providers
Cloud and native provider implementations.
registry
Local model registry for downloading and caching model weights.
tools
Audio tools for LlmAgent integration.
traits
Core provider and processor traits.

Structs§

ApplyFxTool
Tool that applies a named FX chain to audio data.
AssemblyAiStt
AssemblyAI Universal STT provider.
AudioFrame
The canonical audio buffer — raw PCM-16 LE samples with metadata.
AudioPipelineBuilder
Builder for constructing audio pipelines.
CartesiaTts
Cartesia Sonic TTS provider.
CloudTtsConfig
Shared configuration for cloud TTS providers.
DeepgramStt
Deepgram Nova STT provider.
ElevenLabsTts
ElevenLabs TTS provider.
FxChain
An ordered chain of AudioProcessor stages applied in series.
GeminiStt
Gemini STT provider using generateContent with audio input.
GeminiTts
Gemini TTS provider using generateContent with audio response modality.
GenerateMusicTool
Tool that generates music from a text prompt.
LocalModelRegistry
Registry for managing local model downloads and caching.
Mixer
Multi-track audio mixer.
MusicRequest
Request parameters for music generation.
OpenAiTts
OpenAI TTS provider using the /v1/audio/speech endpoint.
PipelineHandle
Handle to a running audio pipeline.
PipelineMetrics
Real-time latency and quality metrics from pipeline stages.
SentenceChunker
Buffers LLM tokens and emits complete sentences at delimiter boundaries.
SpeakTool
Tool that synthesizes text to speech audio.
Speaker
An identified speaker.
SpeakerConfig
Speaker configuration for multi-speaker TTS.
SpeechSegment
A detected speech segment within an audio frame.
SttOptions
Options for speech-to-text transcription.
TranscribeTool
Tool that transcribes audio to text.
Transcript
A transcription result.
TtsRequest
Request parameters for TTS synthesis.
Voice
Descriptor for an available voice.
WhisperApiStt
OpenAI Whisper API STT provider.
Word
A single word with timing and confidence.

Enums§

AudioError
Errors produced by audio subsystems.
AudioFormat
Supported audio formats for encode/decode at transport edges.
Emotion
Emotion hint for TTS synthesis.
PipelineControl
Pipeline control commands.
PipelineInput
Messages that can be sent into a pipeline.
PipelineOutput
Messages produced by a pipeline.

Traits§

AudioProcessor
Trait for stateless or stateful DSP transforms on audio frames.
MusicProvider
Unified trait for music generation providers.
SttProvider
Unified trait for speech-to-text providers.
TtsProvider
Unified trait for text-to-speech providers.
VadProcessor
Trait for Voice Activity Detection processors.

Functions§

decode
Decode encoded bytes into a PCM16 AudioFrame.
encode
Encode an AudioFrame to the target format.
merge_frames
Merge multiple AudioFrame values into a single contiguous frame.

Type Aliases§

AudioResult
Convenience result type for audio operations.