vtt-rs
A Rust library and command-line utility for real-time audio transcription using OpenAI-compatible APIs. Perfect for adding situational awareness to AI agents through speech recognition.
Documentation
- API Documentation - Full API reference (auto-generated from code)
- Integration Guide - Comprehensive guide for AI agent integration
Or build locally:
Configuration
-
If your endpoint requires authentication, set
OPENAI_API_KEYin your environment. For local OpenAI-compatible servers that don't require auth, this can be omitted. -
The binary expects an optional JSON configuration file (default
vtt.config.jsonin the current directory, or pass an alternate path as the first argument). -
Supported keys (all optional; sensible defaults exist):
chunk_duration_secs: duration of each captured audio block that is transcribed.model: which OpenAI transcription model to hit.endpoint: custom transcription endpoint for e.g. a proxy service.out_file: path to append every transcription (chunk ID + contents).on_device: optional block to turn on the bundled Candle Whisper runner.
On-Device Whisper
Set on_device.enabled to true in your config to run Whisper locally without
calling the OpenAI API. You can pick from the built-in model shortcuts
("tiny", "small", etc.), force CPU execution, and optionally select a
specific input device.
Local MLX Parakeet (no API key)
You can use a local OpenAI-compatible server that serves the MLX model mlx-community/parakeet-tdt-0.6b-v2. Point endpoint to your server and set model accordingly. No OPENAI_API_KEY is required when the server does not enforce auth.
Example config snippet:
Then run the CLI without setting OPENAI_API_KEY:
Notes:
- Ensure your local server implements an OpenAI-compatible audio transcription endpoint and understands the
modelidentifier. - On-device mode in this repo currently supports Whisper via Candle. Parakeet support is provided via the remote endpoint path as shown above.
Usage as a Library
Add vtt-rs to your Cargo.toml:
[]
= { = "https://github.com/geoffsee/vtt-rs" }
= { = "1", = ["rt-multi-thread", "macros"] }
Basic Example
use ;
async
AI Agent Integration
The library is designed to give AI agents "ears" - the ability to perceive and respond to their audio environment. Check out the examples:
examples/ai_agent.rs- Basic AI agent with audio awarenessexamples/streaming_agent.rs- Advanced agent with temporal context
Run examples with:
OPENAI_API_KEY=sk-...
Usage as a CLI
# With OpenAI or any endpoint requiring auth
OPENAI_API_KEY=sk-...
# With a local server that does not require auth
- Omit the CLI argument to let the tool load
vtt.config.jsonfrom the current directory if it exists, otherwise it runs with defaults. - Transcripts are printed live and, when
out_fileis set, appended to that file in addition to the console output.
Features
- Real-time transcription: Continuously captures and transcribes audio
- Event-driven API: React to transcriptions as they happen
- Configurable chunking: Adjust audio chunk duration for your needs
- OpenAI compatible: Works with OpenAI Whisper and compatible APIs
- Async/await: Built on Tokio for efficient async processing
- Type-safe: Strongly typed events and configuration