Vona
Vona is the Rust runtime layer for the next wave of voice-native products: fast, composable, provider-neutral speech-to-speech infrastructure you can actually ship.
It gives teams the durable core that most voice prototypes end up rebuilding by hand: realtime session orchestration, audio transport boundaries, backend adapters, tool/context hooks, fallback policy, and deterministic harnesses for the moments that matter most, like interruption, first audio latency, tool calls, and event ordering.
Bring your own product surface, model strategy, deployment topology, and user experience. Vona owns the hard runtime boundary between microphones, transports, speech-to-speech models, local/cloud providers, skills, and policy so your application can move across backends without rewriting its voice stack.
Why Vona
Most speech-to-speech projects start as a model demo, a provider SDK wrapper, or a tangle of application-specific voice-agent glue. That works until you need to swap models, run locally, move to a hosted realtime API, test interruptions, or prove latency before the launch window closes.
Vona is built for that inflection point. It is not another assistant template; it is the runtime substrate underneath one. The goal is simple: make voice systems feel as modular, testable, and backend-portable as the rest of a modern AI stack.
Use Vona when you want:
- a Rust-native boundary between audio transports and speech-to-speech backends
- first-class contracts for both step-oriented STS and event-stream realtime voice
- deterministic tests for interruption, tool-call, context-injection, and fallback behavior
- the option to run model backends in-process, behind HTTP, or behind local IPC
- provider-neutral traits that let one host application try multiple STS backends
- a small core crate that does not own your product policy or UX
Do not use Vona if you need a turnkey assistant, hosted model service, wake-word engine, audio device stack, or production WebRTC integration out of the box.
What Is In This Repository
| Crate | Purpose |
|---|---|
vona |
Umbrella crate that re-exports vona-core and optional adapter crates through features. |
vona-core |
Core traits, event types, session driver, runtime policy, skill registry, and passthrough backend. |
vona-openai-realtime |
OpenAI Realtime protocol mapping for Vona realtime sessions. |
vona-gemini-live |
Gemini Live protocol mapping for Vona realtime sessions. |
vona-azure-speech |
Azure Voice Live plus Azure Speech STT/TTS helper surfaces. |
vona-elevenlabs |
ElevenLabs streaming text-to-speech helper surface for cascaded voice backends. |
vona-deepgram |
Deepgram Flux/listen STT and Aura streaming TTS helper surfaces. |
vona-model-provisioning |
Local model manifest and cache planning for Vona-owned model provisioning. |
vona-seamless |
Seamless M4T-style local ONNX and HTTP sidecar backend adapters. |
vona-moshi |
Kyutai Moshi backend surface using WebSocket and Opus framing. |
vona-transport-local |
Local HTTP/IPC transport helpers and length-prefixed CBOR framing. |
vona-sidecar |
Sidecar binary exposing Vona backends over HTTP and Unix-socket IPC. |
vona-test-harness |
Deterministic mock backend, scripted transport, fixtures, and benchmark harnesses. |
The workspace is backend-agnostic by design. Provider-specific integrations live in adapter crates; the vona-core crate stays focused on stable contracts, while vona is the crates.io facade for applications that want one dependency with opt-in features.
Current Status
Vona is pre-1.0 and suitable for integration experiments, adapter development, and deterministic runtime testing. The public APIs may still change before a stable release.
Implemented today:
- step-oriented speech-to-speech backend trait
- event-stream realtime voice backend trait for hosted APIs, Moshi-family dialogue, and open realtime voice models
- audio transport trait
- session driver with metrics for first audio, tool calls, interruptions, and fallback decisions
- skill execution registry with schema validation and audit events
- context injection through
ExternalContextEvent - passthrough, Seamless M4T-style, Moshi, HTTP sidecar, and local IPC surfaces
- protocol crates for OpenAI Realtime, Gemini Live, Azure Voice Live/Speech, ElevenLabs TTS, and Deepgram STT/TTS
- local model provisioning manifests and cache inspection for local model adapters
- deterministic realtime voice harness for tool-call, interruption, latency-mark, and event-order testing
- deterministic test harnesses and release-gate benchmarks
Known limits:
- production transport adapters such as LiveKit are not included yet
- the Seamless local ONNX path still needs operator-supplied model artifacts until downloader policy is enabled on top of
vona-model-provisioning - text-conditioned local generation is not yet parity-complete with all deployment modes
- cloud provider crates currently implement config and protocol mapping, not live credentialed CI tests
- performance SLOs beyond the deterministic release gate should be measured in your target environment
Prerequisites
Vona is a Rust workspace. Install a recent Rust toolchain with Cargo.
vona-moshi links against Opus:
# macOS
# Debian/Ubuntu
If Opus is installed in a non-standard prefix, set LIBOPUS_LIB_DIR to the prefix path, not the raw lib directory:
Quick Start
Clone the repository and run the deterministic release gate:
For a faster inner loop while developing:
Run the deterministic mock harness:
Installation
For most applications, depend on the facade crate and enable the surfaces you need:
Available facade features:
seamless: re-exportvona-seamlessmoshi: re-exportvona-moshitransport-local: re-exportvona-transport-localand enableseamlesstest-harness: re-exportvona-test-harnessall: enable every facade feature
You can also depend on lower-level crates directly:
From a source checkout, use path dependencies:
[]
= { = "crates/vona", = ["seamless"] }
Minimal Backend Example
The core backend contract is step-oriented. A backend receives an AudioInputFrame, returns zero or more AudioOutputFrames, and may emit control events for the runtime to handle.
use async_trait;
use ;
;
For a ready-made deterministic implementation, use PassthroughStsBackend from the vona crate or MockBackend from vona-test-harness.
Runtime Model
The runtime loop connects four surfaces:
AudioTransport: receives input frames, sends output frames, and clears buffered output on interruptionSpeechToSpeechBackend: owns provider/model session state and performs each audio stepVonaRuntime: applies policy to backend control eventsSkillExecutor: resolves tool calls and injects external context back into the backend
The important integration primitive is ExternalContextEvent. It carries transcript overrides, tool results, planner output, precomputed replies, or other application-owned context without forcing the core backend trait to know about any one product.
See docs/architecture.md for the sidecar contract and request/response shapes.
See docs/sts-model-coverage.md for how Vona distinguishes translation STS, full-duplex dialogue, hosted realtime APIs, open realtime voice models, and cascaded ASR+LLM+TTS systems.
Sidecar And Local Backends
The vona-sidecar binary exposes the Seamless M4T-style backend over HTTP and, on Unix platforms, a local IPC socket.
Default HTTP bind:
VONA_STS_SIDECAR_BIND=127.0.0.1:9090
Health check:
Local Seamless M4T ONNX configuration:
See docs/production-backends.md for operational expectations and current limitations.
Adapter maturity is tracked in docs/adapter-maturity.md.
Model-Free Demo
You can run a complete Vona session without model weights, network access, or audio hardware:
The demo drives a scripted audio frame through the runtime, emits a mock skill call, handles an interruption, injects tool context back into the backend, and prints the resulting session metrics.
Release Gate
The release gate is the source of truth for pre-release validation:
It runs:
- locked workspace checks
- deterministic per-crate tests
- all-target compile checks
- clippy with
-D warnings - deterministic transport smoke benchmarks
- benchmark result generation in docs/benchmark-results.md
Read the full checklist in docs/release-readiness-checklist.md.
Repository Layout
crates/
vona/ facade crate with optional adapter features
vona-core/ core runtime contracts
vona-seamless/ Seamless M4T-style backend adapters
vona-moshi/ Moshi backend surface
vona-transport-local/ local IPC and transport helpers
vona-sidecar/ sidecar binary
vona-test-harness/ deterministic tests and benchmarks
docs/ architecture, backend, benchmark, and release docs
examples/ example slots and fixture-driven demos
tests/fixtures/ deterministic waveform fixtures
scripts/ release and maintenance scripts
Contributing
Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.
Useful rules of thumb:
- keep core contracts provider-neutral
- put provider integrations in adapter crates
- include deterministic tests for runtime, transport, or backend behavior
- keep
bash scripts/release_gate.shgreen
This project follows the Contributor Covenant Code of Conduct.
The current roadmap is in docs/roadmap.md.
Publishing
The crates are intended to publish in dependency order:
vona-corevona-model-provisioningvona-openai-realtimevona-gemini-livevona-azure-speechvona-elevenlabsvona-deepgramvona-seamlessvona-moshivona-test-harnessvona-transport-localvona-sidecarvona
The order matters because the facade crate depends on the adapter crates, and adapter crates depend on vona-core.
Use scripts/release_crates.sh --release current|patch|minor|major to update release metadata, run the release gate, package crates in order, and optionally publish with --publish.
Security
Please do not open public issues for security vulnerabilities. Report them using the process in SECURITY.md.
License
Vona is licensed under the MIT License.