gemini-live 0.1.6

High-performance Rust client for the Gemini Multimodal Live API
Documentation

gemini-live-rs

crates.io docs.rs CI License: MIT

High-performance Rust client for the Gemini Multimodal Live API — real-time, bidirectional audio/video/text streaming over WebSocket.

Features

  • Strongly typed — every wire message has a Rust struct; serde handles the JSON mapping
  • Session management — automatic reconnection with exponential backoff, session resumption, GoAway handling
  • Streaming-firstsend_audio / send_video / send_text for real-time input; event stream for output
  • Performance-conscious — zero-allocation AudioEncoder for the hot path; buffer-reuse design throughout
  • Tool calling — built-in support for function calls, cancellations, and scheduling modes
  • Clone-friendly sessionsSession is cheaply cloneable; multiple tasks can send and receive concurrently
  • Vertex-ready transport — first-class Vertex AI Live routing via regional endpoints and bearer-token auth

Demo

https://github.com/user-attachments/assets/745ef771-bae7-41ef-bd4f-baa994723a75

Quick Start

Add to your Cargo.toml:

[dependencies]
gemini-live = "0.1"
tokio = { version = "1", features = ["full"] }
use gemini_live::session::{Session, SessionConfig, ReconnectPolicy};
use gemini_live::transport::{Auth, TransportConfig};
use gemini_live::types::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut session = Session::connect(SessionConfig {
        transport: TransportConfig {
            auth: Auth::ApiKey(std::env::var("GEMINI_API_KEY")?),
            ..Default::default()
        },
        setup: SetupConfig {
            model: "models/gemini-3.1-flash-live-preview".into(),
            generation_config: Some(GenerationConfig {
                response_modalities: Some(vec![Modality::Text]),
                ..Default::default()
            }),
            ..Default::default()
        },
        reconnect: ReconnectPolicy::default(),
    }).await?;

    session.send_text("Hello!").await?;

    while let Some(event) = session.next_event().await {
        match event {
            ServerEvent::ModelText(text) => print!("{text}"),
            ServerEvent::TurnComplete => println!("\n--- turn done ---"),
            _ => {}
        }
    }
    Ok(())
}

Vertex AI

Use the Vertex transport endpoint with an OAuth access token. setup.model must be the full Vertex model resource name.

use gemini_live::session::{ReconnectPolicy, Session, SessionConfig};
use gemini_live::transport::{Auth, Endpoint, TransportConfig};
use gemini_live::types::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut session = Session::connect(SessionConfig {
        transport: TransportConfig {
            endpoint: Endpoint::VertexAi {
                location: "us-central1".into(),
            },
            auth: Auth::BearerToken(std::env::var("VERTEX_AI_ACCESS_TOKEN")?),
            ..Default::default()
        },
        setup: SetupConfig {
            model: std::env::var("VERTEX_MODEL")?,
            generation_config: Some(GenerationConfig {
                response_modalities: Some(vec![Modality::Text]),
                ..Default::default()
            }),
            ..Default::default()
        },
        reconnect: ReconnectPolicy::default(),
    }).await?;

    session.send_text("Hello from Vertex!").await?;
    drop(session);
    Ok(())
}

If you want reconnect-safe token refresh from Google Cloud Application Default Credentials, enable the optional vertex-auth feature:

[dependencies]
gemini-live = { version = "0.1", features = ["vertex-auth"] }
tokio = { version = "1", features = ["full"] }

Then build the auth mode from ADC instead of injecting a static token:

use gemini_live::transport::Auth;

let auth = Auth::vertex_ai_application_default()?;

Architecture

Session  →  Transport  →  Codec  →  Types / Audio / Errors
Layer Module What it does
Session session.rs Connection lifecycle, auto-reconnect, typed send/receive
Transport transport.rs WebSocket + rustls, frame I/O
Codec codec.rs JSON ↔ Rust conversion; ServerMessageServerEvent decomposition
Audio audio.rs Zero-allocation PCM encoder, format constants
Types types/ All wire-format structs and enums
Errors error.rs Layered error types per architectural layer

Each layer's public API and design notes are documented in source code doc comments — start from lib.rs and drill into modules.

Audio Streaming

For convenience:

session.send_audio(&pcm_i16_le_bytes).await?;

For maximum performance (zero allocation on the hot path):

let mut enc = AudioEncoder::new();
loop {
    let b64 = enc.encode_i16_le(&pcm_chunk);
    let msg = ClientMessage::RealtimeInput(RealtimeInput {
        audio: Some(Blob { data: b64.to_owned(), mime_type: "audio/pcm;rate=16000".into() }),
        video: None, text: None, activity_start: None, activity_end: None,
        audio_stream_end: None,
    });
    session.send_raw(msg).await?;
}

Tool Calling

while let Some(event) = session.next_event().await {
    if let ServerEvent::ToolCall(calls) = event {
        let responses = calls.iter().map(|call| {
            let result = handle_function(&call.name, &call.args);
            FunctionResponse {
                id: call.id.clone(),
                name: call.name.clone(),
                response: result,
            }
        }).collect();
        session.send_tool_response(responses).await?;
    }
}

CLI

crates.io

An interactive TUI client with microphone, speaker, screen sharing, and file sending support. See docs/cli.md for full usage.

Install

Pre-built binary (Linux / macOS):

curl -fsSL https://raw.githubusercontent.com/jacoblincool/gemini-live-rs/main/install.sh | bash

Or via Cargo:

cargo install gemini-live-cli

Build without audio/screen features for a minimal binary:

cargo install gemini-live-cli --no-default-features

Usage

export GEMINI_API_KEY=your-key
gemini-live

Override the model:

GEMINI_MODEL=models/gemini-2.5-flash-native-audio-latest gemini-live

Run the CLI against Vertex AI with a static bearer token:

LIVE_BACKEND=vertex \
VERTEX_LOCATION=us-central1 \
VERTEX_MODEL='projects/PROJECT_ID/locations/us-central1/publishers/google/models/MODEL_ID' \
VERTEX_AI_ACCESS_TOKEN="$(gcloud auth application-default print-access-token)" \
gemini-live

Run the CLI against Vertex AI with Application Default Credentials:

LIVE_BACKEND=vertex \
VERTEX_LOCATION=us-central1 \
VERTEX_MODEL='projects/PROJECT_ID/locations/us-central1/publishers/google/models/MODEL_ID' \
VERTEX_AUTH=adc \
cargo run -p gemini-live-cli --features vertex-auth

Commands

Input Action
hello Send text to the model
@photo.jpg Send an image file
@recording.wav Send a WAV audio file
@photo.jpg describe this Send image + text together
/mic Toggle microphone input (with AEC)
/speak Toggle speaker output (with AEC)
/share-screen list List available capture targets
/share-screen <id> [interval] Start sharing a monitor or window
/share-screen Stop screen sharing

Self-update

gemini-live update

Feature Flags

Feature Dependencies Enables
mic (default) cpal, webrtc-audio-processing /mic command with AEC
speak (default) cpal, webrtc-audio-processing /speak command with AEC
share-screen (default) xcap, image /share-screen command

Documentation

File Purpose
docs/cli.md CLI usage, commands, feature flags, and architecture
docs/protocol.md Upstream API reference (endpoints, lifecycle, VAD, session limits, model differences)
docs/design.md Architecture decisions and performance goals
docs/roadmap.md Planned work, known gaps, tech debt
docs/testing.md Test inventory and instructions

License

MIT


Author's Note

This repository is also an experiment in how to design a set of guiding principles that enable AI agents to autonomously maintain a client library over time.

Maintaining a client library is not a one-shot code generation problem — it is an ongoing engineering challenge. The library must track upstream API changes, keep documentation in sync, preserve backward compatibility, expand test coverage, and maintain design consistency. These are exactly the kinds of tasks where AI agents could contribute meaningfully, if given the right structure to work within.

The core idea behind this project is to explore what that structure looks like: which conventions, workflows, and constraints help an AI agent maintain stable, extensible, and high-quality output with minimal human intervention. The documentation architecture here — AGENTS.md for general principles, protocol.md for upstream facts, design.md for our decisions, roadmap.md for tracking gaps — is designed so that an agent can orient itself, identify what needs to change, and act accordingly.

If these principles can be defined clearly enough, an AI agent becomes more than a tool that executes instructions — it becomes a collaborator capable of participating in long-term maintenance.