gemini-live-rs
High-performance Rust client for the Gemini Multimodal Live API — real-time, bidirectional audio/video/text streaming over WebSocket.
Features
- Strongly typed — every wire message has a Rust struct; serde handles the JSON mapping
- Session management — automatic reconnection with exponential backoff, session resumption, GoAway handling
- Streaming-first —
send_audio/send_video/send_textfor real-time input; event stream for output - Performance-conscious — zero-allocation
AudioEncoderfor the hot path; buffer-reuse design throughout - Tool calling — built-in support for function calls, cancellations, and scheduling modes
- Clone-friendly sessions —
Sessionis cheaply cloneable; multiple tasks can send and receive concurrently
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
= { = "1", = ["full"] }
use ;
use ;
use *;
async
Architecture
Session → Transport → Codec → Types / Audio / Errors
| Layer | Module | What it does |
|---|---|---|
| Session | session.rs |
Connection lifecycle, auto-reconnect, typed send/receive |
| Transport | transport.rs |
WebSocket + rustls, frame I/O |
| Codec | codec.rs |
JSON ↔ Rust conversion; ServerMessage → ServerEvent decomposition |
| Audio | audio.rs |
Zero-allocation PCM encoder, format constants |
| Types | types/ |
All wire-format structs and enums |
| Errors | error.rs |
Layered error types per architectural layer |
Each layer's public API and design notes are documented in source code doc comments — start from lib.rs and drill into modules.
Audio Streaming
For convenience:
session.send_audio.await?;
For maximum performance (zero allocation on the hot path):
let mut enc = new;
loop
Tool Calling
while let Some = session.next_event.await
CLI
An interactive TUI client with microphone, speaker, screen sharing, and file sending support. See docs/cli.md for full usage.
Install
Pre-built binary (Linux / macOS):
|
Or via Cargo:
Build without audio/screen features for a minimal binary:
Usage
Override the model:
GEMINI_MODEL=models/gemini-2.5-flash-native-audio-latest
Commands
| Input | Action |
|---|---|
hello |
Send text to the model |
@photo.jpg |
Send an image file |
@recording.wav |
Send a WAV audio file |
@photo.jpg describe this |
Send image + text together |
/mic |
Toggle microphone input (with AEC) |
/speak |
Toggle speaker output (with AEC) |
/share-screen list |
List available capture targets |
/share-screen <id> [interval] |
Start sharing a monitor or window |
/share-screen |
Stop screen sharing |
Self-update
Feature Flags
| Feature | Dependencies | Enables |
|---|---|---|
mic (default) |
cpal, webrtc-audio-processing |
/mic command with AEC |
speak (default) |
cpal, webrtc-audio-processing |
/speak command with AEC |
share-screen (default) |
xcap, image |
/share-screen command |
Documentation
| File | Purpose |
|---|---|
docs/cli.md |
CLI usage, commands, feature flags, and architecture |
docs/protocol.md |
Upstream API reference (endpoints, lifecycle, VAD, session limits, model differences) |
docs/design.md |
Architecture decisions and performance goals |
docs/roadmap.md |
Planned work, known gaps, tech debt |
docs/testing.md |
Test inventory and instructions |
License
MIT
Author's Note
This repository is also an experiment in how to design a set of guiding principles that enable AI agents to autonomously maintain a client library over time.
Maintaining a client library is not a one-shot code generation problem — it is an ongoing engineering challenge. The library must track upstream API changes, keep documentation in sync, preserve backward compatibility, expand test coverage, and maintain design consistency. These are exactly the kinds of tasks where AI agents could contribute meaningfully, if given the right structure to work within.
The core idea behind this project is to explore what that structure looks like: which conventions, workflows, and constraints help an AI agent maintain stable, extensible, and high-quality output with minimal human intervention. The documentation architecture here — AGENTS.md for general principles, protocol.md for upstream facts, design.md for our decisions, roadmap.md for tracking gaps — is designed so that an agent can orient itself, identify what needs to change, and act accordingly.
If these principles can be defined clearly enough, an AI agent becomes more than a tool that executes instructions — it becomes a collaborator capable of participating in long-term maintenance.