adk-realtime
Real-time bidirectional audio streaming for Rust Agent Development Kit (ADK-Rust) agents.
Overview
adk-realtime provides a unified interface for building voice-enabled AI agents using real-time streaming APIs. It follows the OpenAI Agents SDK pattern with a separate, decoupled implementation that integrates seamlessly with the ADK agent ecosystem.
Features
- RealtimeAgent — Implements
adk_core::Agentwith full callback/tool/instruction support - Multiple Providers — OpenAI Realtime API, Gemini Live API, Vertex AI Live API
- Multiple Transports — WebSocket, WebRTC (OpenAI), LiveKit bridge
- Audio Streaming — Bidirectional audio with PCM16, G711, Opus formats
- Voice Activity Detection — Server-side VAD for natural conversation flow
- Tool Calling — Real-time function/tool execution during voice conversations
- Agent Handoff — Transfer between agents using
sub_agents - Feature Flags — Pay only for what you use; all transports are opt-in
Architecture
┌─────────────────────────────────────────┐
│ Agent Trait │
│ (name, description, run, sub_agents) │
└────────────────┬────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌──────▼──────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
│ LlmAgent │ │ RealtimeAgent │ │ SequentialAgent │
│ (text-based) │ │ (voice-based) │ │ (workflow) │
└─────────────┘ └───────────────────┘ └───────────────────┘
Transport Layer
┌──────────────────────────────────────────────────────────────┐
│ RealtimeSession trait │
├──────────────┬──────────────┬──────────────┬─────────────────┤
│ OpenAI WS │ OpenAI WebRTC│ Gemini Live │ Vertex AI Live │
│ (openai) │ (openai- │ (gemini) │ (vertex-live) │
│ │ webrtc) │ │ │
└──────────────┴──────────────┴──────────────┴─────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ LiveKit WebRTC Bridge (livekit) │
│ LiveKitEventHandler · bridge_input · bridge_gemini_input │
└──────────────────────────────────────────────────────────────┘
Supported Providers & Transports
| Provider | Model | Transport | Feature Flag | Description |
|---|---|---|---|---|
| OpenAI | gpt-4o-realtime-preview-2024-12-17 |
WebSocket | openai |
Stable realtime model |
| OpenAI | gpt-realtime |
WebSocket | openai |
Latest model with improved speech & function calling |
| OpenAI | gpt-4o-realtime-* |
WebRTC | openai-webrtc |
Browser-grade transport with Opus codec |
gemini-live-2.5-flash-native-audio |
WebSocket | gemini |
Gemini Live API | |
| Gemini via Vertex AI | WebSocket + OAuth2 | vertex-live |
Vertex AI Live with ADC authentication | |
| LiveKit | Any (bridge) | WebRTC | livekit |
Production WebRTC bridge to Gemini/OpenAI |
Quick Start
Add to your Cargo.toml:
[]
= { = "0.3", = ["openai"] }
Using RealtimeAgent (Recommended)
use ;
use Arc;
async
Using Low-Level Session API
use ;
use OpenAIRealtimeModel;
async
Transport Guides
Vertex AI Live
Connect to Gemini Live API via Vertex AI with Application Default Credentials:
= { = "0.3", = ["vertex-live"] }
use ;
// Convenience constructor — auto-discovers ADC credentials
let backend = vertex_adc?;
// Or manual credentials construction
let credentials = default.await?;
let backend = Vertex ;
let model = new;
let session = model.connect.await?;
Prerequisites:
- Google Cloud project with Vertex AI API enabled
- ADC configured (
gcloud auth application-default login)
OpenAI WebRTC
Lower-latency audio transport using Sans-IO WebRTC with Opus codec:
= { = "0.3", = ["openai-webrtc"] }
use ;
let model = new
.with_transport;
let session = model.connect.await?;
Build requirement: cmake must be installed (the audiopus crate builds the Opus C library from source). With cmake >= 4.0, set the environment variable:
LiveKit WebRTC Bridge
Bridge any EventHandler to a LiveKit room for production voice apps:
= { = "0.3", = ["livekit", "openai"] }
use ;
// Wrap your event handler to publish model audio to LiveKit
let lk_handler = new;
// Bridge participant audio from LiveKit into the RealtimeRunner
spawn;
For Gemini's 16 kHz format, use bridge_gemini_input instead.
Feature Flags
| Flag | Dependencies | Description |
|---|---|---|
openai |
async-openai, tokio-tungstenite |
OpenAI Realtime API (WebSocket) |
gemini |
tokio-tungstenite, adk-gemini |
Gemini Live API (AI Studio) |
vertex-live |
gemini + google-cloud-auth |
Vertex AI Live API (OAuth2/ADC) |
livekit |
livekit, livekit-api |
LiveKit WebRTC bridge |
openai-webrtc |
openai + str0m, audiopus, reqwest |
OpenAI WebRTC transport (requires cmake) |
full |
all of the above except openai-webrtc | Everything that doesn't require cmake |
full-webrtc |
full + openai-webrtc |
Everything including WebRTC (requires cmake) |
Default features: none. You opt in to exactly what you need.
Feature Flag Dependency Graph
vertex-live ──► gemini + google-cloud-auth
openai-webrtc ──► openai + str0m + audiopus + reqwest
livekit ──► livekit + livekit-api
full ──► openai + gemini + vertex-live + livekit
full-webrtc ──► full + openai-webrtc
RealtimeAgent Features
Shared with LlmAgent
| Feature | Description |
|---|---|
instruction(str) |
Static system instruction |
instruction_provider(fn) |
Dynamic instruction based on context |
global_instruction(str) |
Global instruction (prepended) |
tool(Arc<dyn Tool>) |
Register a tool |
sub_agent(Arc<dyn Agent>) |
Register sub-agent for handoffs |
before_agent_callback |
Called before agent runs |
after_agent_callback |
Called after agent completes |
before_tool_callback |
Called before tool execution |
after_tool_callback |
Called after tool execution |
Realtime-Specific
| Feature | Description |
|---|---|
voice(str) |
Voice selection ("alloy", "coral", "sage", etc.) |
server_vad() |
Enable server-side VAD with defaults |
vad(VadConfig) |
Custom VAD configuration |
modalities(vec) |
Output modalities (["text", "audio"]) |
on_audio(callback) |
Callback for audio output events |
on_transcript(callback) |
Callback for transcript events |
on_speech_started(callback) |
Callback when speech detected |
on_speech_stopped(callback) |
Callback when speech ends |
Event Types
Server Events
| Event | Description |
|---|---|
SessionCreated |
Connection established |
AudioDelta |
Audio chunk (base64 PCM or Opus) |
TextDelta |
Text response chunk |
TranscriptDelta |
Input audio transcript |
FunctionCallDone |
Tool call request |
ResponseDone |
Response completed |
SpeechStarted |
VAD detected speech |
SpeechStopped |
VAD detected silence |
Error |
Error occurred |
Client Events
| Event | Description |
|---|---|
AudioAppend |
Send audio chunk |
AudioCommit |
Commit audio buffer |
ItemCreate |
Send text or tool response |
ResponseCreate |
Request a response |
ResponseCancel |
Interrupt response |
SessionUpdate |
Update configuration |
Audio Formats
| Format | Sample Rate | Bits | Channels | Provider |
|---|---|---|---|---|
| PCM16 | 24000 Hz | 16 | Mono | OpenAI |
| PCM16 | 16000 Hz | 16 | Mono | Gemini (input) |
| PCM16 | 24000 Hz | 16 | Mono | Gemini (output) |
| Opus | 24000 Hz | — | Mono | OpenAI WebRTC |
| G711 u-law | 8000 Hz | 8 | Mono | OpenAI |
| G711 A-law | 8000 Hz | 8 | Mono | OpenAI |
Error Types
Transport-specific error variants with actionable context:
| Variant | Feature | Description |
|---|---|---|
OpusCodecError |
openai-webrtc |
Opus encoding/decoding failures |
WebRTCError |
openai-webrtc |
WebRTC connection and signaling failures |
LiveKitError |
livekit |
LiveKit bridge failures |
AuthError |
vertex-live |
OAuth2/ADC credential failures |
ConfigError |
all | Missing or invalid configuration |
ConnectionError |
all | Transport connection failures |
Examples
# Vertex AI Live voice assistant (requires ADC + GCP project)
# LiveKit bridge with OpenAI model (requires LiveKit server)
# OpenAI WebRTC low-latency session (requires cmake + API key)
CMAKE_POLICY_VERSION_MINIMUM=3.5
Testing
# Property tests (no credentials needed)
CMAKE_POLICY_VERSION_MINIMUM=3.5
CMAKE_POLICY_VERSION_MINIMUM=3.5
# All features
CMAKE_POLICY_VERSION_MINIMUM=3.5
# Integration tests (require real credentials, marked #[ignore])
Compilation Verification
CMAKE_POLICY_VERSION_MINIMUM=3.5 \
CMAKE_POLICY_VERSION_MINIMUM=3.5 \
Feature Flags
| Flag | Description | Requires |
|---|---|---|
openai |
OpenAI Realtime API (WebSocket) | |
gemini |
Gemini Live API (WebSocket) | |
vertex-live |
Vertex AI Live API (OAuth2 via ADC) | GCP credentials |
livekit |
LiveKit WebRTC bridge | LiveKit server |
openai-webrtc |
OpenAI WebRTC transport with Opus codec | cmake |
full |
All providers except WebRTC (no cmake needed) | |
full-webrtc |
Everything including WebRTC | cmake |
Vertex AI Live
Connect to Gemini via Vertex AI with Application Default Credentials:
use ;
// Uses ADC — no API key needed, just `gcloud auth application-default login`
let model = new;
Feature Flag Graph
vertex-live → gemini + google-cloud-auth
livekit → livekit + livekit-api
openai-webrtc → openai + str0m + audiopus (requires cmake)
full → openai + gemini + vertex-live + livekit
full-webrtc → full + openai-webrtc
## License
Apache-2.0
## Part of ADK-Rust
This crate is part of the [ADK-Rust](https://github.com/zavora-ai/adk-rust) framework for building AI agents in Rust.