Expand description
§adk-realtime
Real-time bidirectional audio/video streaming for ADK agents.
This crate provides a unified interface for building voice-enabled AI agents using real-time streaming APIs from various providers (OpenAI, Gemini, etc.).
§Architecture
adk-realtime follows the same pattern as OpenAI’s Agents SDK, providing both
a low-level session interface and a high-level RealtimeAgent that implements
the standard ADK Agent trait.
┌─────────────────────────────────────────┐
│ Agent Trait │
│ (name, description, run, sub_agents) │
└────────────────┬────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌────────▼────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
│ LlmAgent │ │ RealtimeAgent │ │ SequentialAgent │
│ (text-based) │ │ (voice-based) │ │ (workflow) │
└─────────────────┘ └───────────────────┘ └───────────────────┘§Features
- RealtimeAgent: Implements
adk_core::Agentwith callbacks, tools, instructions - Multiple Providers: OpenAI Realtime API and Gemini Live API support
- Audio Streaming: Bidirectional audio with various formats (PCM16, G711)
- Voice Activity Detection: Server-side VAD for natural conversations
- Tool Calling: Real-time function execution during voice sessions
§Example - Using RealtimeAgent (Recommended)
ⓘ
use adk_realtime::{RealtimeAgent, openai::OpenAIRealtimeModel};
use adk_runner::Runner;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let model = OpenAIRealtimeModel::new(api_key, "gpt-4o-realtime-preview-2024-12-17");
let agent = RealtimeAgent::builder("voice_assistant")
.model(Arc::new(model))
.instruction("You are a helpful voice assistant.")
.voice("alloy")
.server_vad()
.tool(Arc::new(weather_tool))
.build()?;
// Use with standard ADK runner
let runner = Runner::new(Arc::new(agent));
runner.run(session, content).await?;
Ok(())
}§Example - Using Low-Level Session API
ⓘ
use adk_realtime::{RealtimeModel, RealtimeConfig, ServerEvent};
use adk_realtime::openai::OpenAIRealtimeModel;
let model = OpenAIRealtimeModel::new(api_key, "gpt-4o-realtime-preview-2024-12-17");
let session = model.connect(config).await?;
while let Some(event) = session.next_event().await {
match event? {
ServerEvent::AudioDelta { delta, .. } => { /* play audio */ }
ServerEvent::TextDelta { delta, .. } => println!("{}", delta),
_ => {}
}
}Re-exports§
pub use agent::RealtimeAgent;pub use agent::RealtimeAgentBuilder;pub use audio::AudioEncoding;pub use audio::AudioFormat;pub use config::RealtimeConfig;pub use config::RealtimeConfigBuilder;pub use config::VadConfig;pub use config::VadMode;pub use error::RealtimeError;pub use error::Result;pub use events::ClientEvent;pub use events::ServerEvent;pub use events::ToolCall;pub use events::ToolResponse;pub use model::BoxedModel;pub use model::RealtimeModel;pub use runner::RealtimeRunner;pub use session::BoxedSession;pub use session::RealtimeSession;pub use session::RealtimeSessionExt;
Modules§
- agent
- RealtimeAgent - an Agent implementation for real-time voice interactions.
- audio
- Audio format definitions and utilities.
- config
- Configuration types for realtime sessions.
- error
- Error types for the realtime module.
- events
- Event types for realtime communication.
- model
- Core RealtimeModel trait definition.
- runner
- RealtimeRunner for integrating realtime sessions with agents.
- session
- Core RealtimeSession trait definition.