adk-realtime 0.3.2

Real-time bidirectional audio/video streaming for Rust Agent Development Kit (ADK-Rust) agents
Documentation

adk-realtime

Real-time bidirectional audio streaming for Rust Agent Development Kit (ADK-Rust) agents.

Crates.io Documentation License

Overview

adk-realtime provides a unified interface for building voice-enabled AI agents using real-time streaming APIs. It follows the OpenAI Agents SDK pattern with a separate, decoupled implementation that integrates seamlessly with the ADK agent ecosystem.

Features

  • RealtimeAgent — Implements adk_core::Agent with full callback/tool/instruction support
  • Multiple Providers — OpenAI Realtime API, Gemini Live API, Vertex AI Live API
  • Multiple Transports — WebSocket, WebRTC (OpenAI), LiveKit bridge
  • Audio Streaming — Bidirectional audio with PCM16, G711, Opus formats
  • Voice Activity Detection — Server-side VAD for natural conversation flow
  • Tool Calling — Real-time function/tool execution during voice conversations
  • Agent Handoff — Transfer between agents using sub_agents
  • Feature Flags — Pay only for what you use; all transports are opt-in

Architecture

              ┌─────────────────────────────────────────┐
              │              Agent Trait                 │
              │  (name, description, run, sub_agents)    │
              └────────────────┬────────────────────────┘
                               │
       ┌───────────────────────┼───────────────────────┐
       │                       │                       │
┌──────▼──────┐      ┌─────────▼─────────┐   ┌─────────▼─────────┐
│  LlmAgent   │      │  RealtimeAgent    │   │  SequentialAgent  │
│ (text-based) │      │  (voice-based)    │   │   (workflow)      │
└─────────────┘      └───────────────────┘   └───────────────────┘

Transport Layer

┌──────────────────────────────────────────────────────────────┐
│                    RealtimeSession trait                      │
├──────────────┬──────────────┬──────────────┬─────────────────┤
│ OpenAI WS    │ OpenAI WebRTC│ Gemini Live  │ Vertex AI Live  │
│ (openai)     │ (openai-     │ (gemini)     │ (vertex-live)   │
│              │  webrtc)     │              │                 │
└──────────────┴──────────────┴──────────────┴─────────────────┘

┌──────────────────────────────────────────────────────────────┐
│              LiveKit WebRTC Bridge (livekit)                  │
│  LiveKitEventHandler · bridge_input · bridge_gemini_input    │
└──────────────────────────────────────────────────────────────┘

Supported Providers & Transports

Provider Model Transport Feature Flag Description
OpenAI gpt-4o-realtime-preview-2024-12-17 WebSocket openai Stable realtime model
OpenAI gpt-realtime WebSocket openai Latest model with improved speech & function calling
OpenAI gpt-4o-realtime-* WebRTC openai-webrtc Browser-grade transport with Opus codec
Google gemini-live-2.5-flash-native-audio WebSocket gemini Gemini Live API
Google Gemini via Vertex AI WebSocket + OAuth2 vertex-live Vertex AI Live with ADC authentication
LiveKit Any (bridge) WebRTC livekit Production WebRTC bridge to Gemini/OpenAI

Quick Start

Add to your Cargo.toml:

[dependencies]
adk-realtime = { version = "0.3", features = ["openai"] }

Using RealtimeAgent (Recommended)

use adk_realtime::{RealtimeAgent, openai::OpenAIRealtimeModel};
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("OPENAI_API_KEY")?;
    let model = Arc::new(OpenAIRealtimeModel::new(&api_key, "gpt-4o-realtime-preview-2024-12-17"));

    let agent = RealtimeAgent::builder("voice_assistant")
        .model(model)
        .instruction("You are a helpful voice assistant.")
        .voice("alloy")
        .server_vad()
        .build()?;

    // RealtimeAgent implements the Agent trait — use with ADK runner
    Ok(())
}

Using Low-Level Session API

use adk_realtime::{RealtimeModel, RealtimeConfig, ServerEvent};
use adk_realtime::openai::OpenAIRealtimeModel;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let model = OpenAIRealtimeModel::new(
        std::env::var("OPENAI_API_KEY")?,
        "gpt-4o-realtime-preview-2024-12-17",
    );

    let config = RealtimeConfig::default()
        .with_instruction("You are a helpful voice assistant.")
        .with_voice("alloy");

    let session = model.connect(config).await?;
    session.send_text("Hello!").await?;
    session.create_response().await?;

    while let Some(event) = session.next_event().await {
        match event? {
            ServerEvent::AudioDelta { delta, .. } => { /* play audio */ }
            ServerEvent::TextDelta { delta, .. } => print!("{}", delta),
            _ => {}
        }
    }
    Ok(())
}

Transport Guides

Vertex AI Live

Connect to Gemini Live API via Vertex AI with Application Default Credentials:

adk-realtime = { version = "0.3", features = ["vertex-live"] }
use adk_realtime::gemini::{GeminiLiveBackend, GeminiRealtimeModel};

// Convenience constructor — auto-discovers ADC credentials
let backend = GeminiLiveBackend::vertex_adc("my-project", "us-central1")?;

// Or manual credentials construction
let credentials = google_cloud_auth::credentials::Credentials::default().await?;
let backend = GeminiLiveBackend::Vertex {
    credentials,
    region: "us-central1".into(),
    project_id: std::env::var("GOOGLE_CLOUD_PROJECT")?,
};

let model = GeminiRealtimeModel::new(backend, "models/gemini-live-2.5-flash-native-audio");
let session = model.connect(config).await?;

Prerequisites:

  • Google Cloud project with Vertex AI API enabled
  • ADC configured (gcloud auth application-default login)

OpenAI WebRTC

Lower-latency audio transport using Sans-IO WebRTC with Opus codec:

adk-realtime = { version = "0.3", features = ["openai-webrtc"] }
use adk_realtime::openai::{OpenAIRealtimeModel, OpenAITransport};

let model = OpenAIRealtimeModel::new(api_key, "gpt-4o-realtime-preview-2024-12-17")
    .with_transport(OpenAITransport::WebRTC);
let session = model.connect(config).await?;

Build requirement: cmake must be installed (the audiopus crate builds the Opus C library from source). With cmake >= 4.0, set the environment variable:

export CMAKE_POLICY_VERSION_MINIMUM=3.5

LiveKit WebRTC Bridge

Bridge any EventHandler to a LiveKit room for production voice apps:

adk-realtime = { version = "0.3", features = ["livekit", "openai"] }
use adk_realtime::livekit::{LiveKitEventHandler, bridge_input};

// Wrap your event handler to publish model audio to LiveKit
let lk_handler = LiveKitEventHandler::new(inner_handler, audio_source, 24000, 1);

// Bridge participant audio from LiveKit into the RealtimeRunner
tokio::spawn(bridge_input(remote_track, runner));

For Gemini's 16 kHz format, use bridge_gemini_input instead.

Feature Flags

Flag Dependencies Description
openai async-openai, tokio-tungstenite OpenAI Realtime API (WebSocket)
gemini tokio-tungstenite, adk-gemini Gemini Live API (AI Studio)
vertex-live gemini + google-cloud-auth Vertex AI Live API (OAuth2/ADC)
livekit livekit, livekit-api LiveKit WebRTC bridge
openai-webrtc openai + str0m, audiopus, reqwest OpenAI WebRTC transport (requires cmake)
full all of the above except openai-webrtc Everything that doesn't require cmake
full-webrtc full + openai-webrtc Everything including WebRTC (requires cmake)

Default features: none. You opt in to exactly what you need.

Feature Flag Dependency Graph

vertex-live  ──► gemini + google-cloud-auth
openai-webrtc ──► openai + str0m + audiopus + reqwest
livekit      ──► livekit + livekit-api
full         ──► openai + gemini + vertex-live + livekit
full-webrtc  ──► full + openai-webrtc

RealtimeAgent Features

Shared with LlmAgent

Feature Description
instruction(str) Static system instruction
instruction_provider(fn) Dynamic instruction based on context
global_instruction(str) Global instruction (prepended)
tool(Arc<dyn Tool>) Register a tool
sub_agent(Arc<dyn Agent>) Register sub-agent for handoffs
before_agent_callback Called before agent runs
after_agent_callback Called after agent completes
before_tool_callback Called before tool execution
after_tool_callback Called after tool execution

Realtime-Specific

Feature Description
voice(str) Voice selection ("alloy", "coral", "sage", etc.)
server_vad() Enable server-side VAD with defaults
vad(VadConfig) Custom VAD configuration
modalities(vec) Output modalities (["text", "audio"])
on_audio(callback) Callback for audio output events
on_transcript(callback) Callback for transcript events
on_speech_started(callback) Callback when speech detected
on_speech_stopped(callback) Callback when speech ends

Event Types

Server Events

Event Description
SessionCreated Connection established
AudioDelta Audio chunk (base64 PCM or Opus)
TextDelta Text response chunk
TranscriptDelta Input audio transcript
FunctionCallDone Tool call request
ResponseDone Response completed
SpeechStarted VAD detected speech
SpeechStopped VAD detected silence
Error Error occurred

Client Events

Event Description
AudioAppend Send audio chunk
AudioCommit Commit audio buffer
ItemCreate Send text or tool response
ResponseCreate Request a response
ResponseCancel Interrupt response
SessionUpdate Update configuration

Audio Formats

Format Sample Rate Bits Channels Provider
PCM16 24000 Hz 16 Mono OpenAI
PCM16 16000 Hz 16 Mono Gemini (input)
PCM16 24000 Hz 16 Mono Gemini (output)
Opus 24000 Hz Mono OpenAI WebRTC
G711 u-law 8000 Hz 8 Mono OpenAI
G711 A-law 8000 Hz 8 Mono OpenAI

Error Types

Transport-specific error variants with actionable context:

Variant Feature Description
OpusCodecError openai-webrtc Opus encoding/decoding failures
WebRTCError openai-webrtc WebRTC connection and signaling failures
LiveKitError livekit LiveKit bridge failures
AuthError vertex-live OAuth2/ADC credential failures
ConfigError all Missing or invalid configuration
ConnectionError all Transport connection failures

Examples

# Vertex AI Live voice assistant (requires ADC + GCP project)
cargo run --example vertex_live_voice --features vertex-live

# LiveKit bridge with OpenAI model (requires LiveKit server)
cargo run --example livekit_bridge --features "livekit,openai"

# OpenAI WebRTC low-latency session (requires cmake + API key)
CMAKE_POLICY_VERSION_MINIMUM=3.5 cargo run --example openai_webrtc --features openai-webrtc

Testing

# Property tests (no credentials needed)
cargo test -p adk-realtime --test error_context_tests
cargo test -p adk-realtime --features vertex-live --test vertex_url_property_tests
cargo test -p adk-realtime --features livekit --test livekit_delegation_tests
CMAKE_POLICY_VERSION_MINIMUM=3.5 cargo test -p adk-realtime --features openai-webrtc --test opus_roundtrip_tests
CMAKE_POLICY_VERSION_MINIMUM=3.5 cargo test -p adk-realtime --features openai-webrtc --test sdp_offer_tests

# All features
CMAKE_POLICY_VERSION_MINIMUM=3.5 cargo test -p adk-realtime --features full

# Integration tests (require real credentials, marked #[ignore])
cargo test -p adk-realtime --features vertex-live -- --ignored

Compilation Verification

cargo check -p adk-realtime                          # default (no deps)
cargo check -p adk-realtime --features openai        # OpenAI WebSocket
cargo check -p adk-realtime --features gemini        # Gemini Live
cargo check -p adk-realtime --features vertex-live   # Vertex AI Live
cargo check -p adk-realtime --features livekit       # LiveKit bridge
CMAKE_POLICY_VERSION_MINIMUM=3.5 \
  cargo check -p adk-realtime --features openai-webrtc  # OpenAI WebRTC
CMAKE_POLICY_VERSION_MINIMUM=3.5 \
  cargo check -p adk-realtime --features full            # everything

Feature Flags

Flag Description Requires
openai OpenAI Realtime API (WebSocket)
gemini Gemini Live API (WebSocket)
vertex-live Vertex AI Live API (OAuth2 via ADC) GCP credentials
livekit LiveKit WebRTC bridge LiveKit server
openai-webrtc OpenAI WebRTC transport with Opus codec cmake
full All providers except WebRTC (no cmake needed)
full-webrtc Everything including WebRTC cmake

Vertex AI Live

Connect to Gemini via Vertex AI with Application Default Credentials:

use adk_realtime::gemini::{GeminiLiveBackend, GeminiLiveModel, build_vertex_live_url};

// Uses ADC — no API key needed, just `gcloud auth application-default login`
let model = GeminiLiveModel::new(GeminiLiveBackend::Vertex {
    project_id: "my-project".into(),
    region: "us-central1".into(),
    model: "gemini-live-2.5-flash-native-audio".into(),
});

Feature Flag Graph

vertex-live  → gemini + google-cloud-auth
livekit      → livekit + livekit-api
openai-webrtc → openai + str0m + audiopus (requires cmake)
full         → openai + gemini + vertex-live + livekit
full-webrtc  → full + openai-webrtc

## License

Apache-2.0

## Part of ADK-Rust

This crate is part of the [ADK-Rust](https://github.com/zavora-ai/adk-rust) framework for building AI agents in Rust.