# vtt-rs
A Rust library and command-line utility for real-time audio transcription using OpenAI-compatible APIs. Perfect for adding situational awareness to AI agents through speech recognition.
[](https://github.com/geoffsee/vtt-rs/actions/workflows/ci.yml)
[](https://github.com/geoffsee/vtt-rs/actions/workflows/docs.yml)
[](https://geoffsee.github.io/vtt-rs/vtt_rs/)
## Documentation
- **[API Documentation](https://geoffsee.github.io/vtt-rs/vtt_rs/)** - Full API reference (auto-generated from code)
- **[Integration Guide](INTEGRATION_GUIDE.md)** - Comprehensive guide for AI agent integration
Or build locally:
```bash
cargo doc --no-deps --open
```
## Configuration
- If your endpoint requires authentication, set `OPENAI_API_KEY` in your environment. For local OpenAI-compatible servers that don't require auth, this can be omitted.
- The binary expects an optional JSON configuration file (default `vtt.config.json` in the current directory, or pass an alternate path as the first argument).
- Supported keys (all optional; sensible defaults exist):
```json
{
"chunk_duration_secs": 5,
"model": "whisper-1",
"endpoint": "https://api.openai.com/v1/audio/transcriptions",
"out_file": "transcripts.log",
"on_device": {
"enabled": false,
"model": "tiny.en",
"cpu": true
}
}
```
* `chunk_duration_secs`: duration of each captured audio block that is transcribed.
* `model`: which OpenAI transcription model to hit.
* `endpoint`: custom transcription endpoint for e.g. a proxy service.
* `out_file`: path to append every transcription (chunk ID + contents).
* `on_device`: optional block to turn on the bundled Candle Whisper runner.
### On-Device Whisper
Set `on_device.enabled` to `true` in your config to run Whisper locally without
calling the OpenAI API. You can pick from the built-in model shortcuts
(`"tiny"`, `"small"`, etc.), force CPU execution, and optionally select a
specific input device.
### Local MLX Parakeet (no API key)
You can use a local OpenAI-compatible server that serves the MLX model `mlx-community/parakeet-tdt-0.6b-v2`. Point `endpoint` to your server and set `model` accordingly. No `OPENAI_API_KEY` is required when the server does not enforce auth.
Example config snippet:
```json
{
"chunk_duration_secs": 3,
"model": "mlx-community/parakeet-tdt-0.6b-v2",
"endpoint": "http://localhost:8000/v1/audio/transcriptions",
"out_file": "transcripts.log"
}
```
Then run the CLI without setting `OPENAI_API_KEY`:
```bash
cargo run -- vtt.config.json
```
Notes:
- Ensure your local server implements an OpenAI-compatible audio transcription endpoint and understands the `model` identifier.
- On-device mode in this repo currently supports Whisper via Candle. Parakeet support is provided via the remote endpoint path as shown above.
## Usage as a Library
Add vtt-rs to your `Cargo.toml`:
```toml
[dependencies]
vtt-rs = { git = "https://github.com/geoffsee/vtt-rs" }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
```
### Basic Example
```rust
use vtt_rs::{Config, TranscriptionEvent, TranscriptionService};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let config = Config::default();
let api_key = std::env::var("OPENAI_API_KEY")?;
let mut service = TranscriptionService::new(config, api_key)?;
let (mut receiver, _stream) = service.start().await?;
// Process transcription events
while let Some(event) = receiver.recv().await {
match event {
TranscriptionEvent::Transcription { chunk_id, text } => {
println!("Heard: {}", text);
// Feed this to your AI agent for situational awareness
}
TranscriptionEvent::Error { chunk_id, error } => {
eprintln!("Error: {}", error);
}
}
}
Ok(())
}
```
### AI Agent Integration
The library is designed to give AI agents "ears" - the ability to perceive and respond to their audio environment. Check out the examples:
- `examples/ai_agent.rs` - Basic AI agent with audio awareness
- `examples/streaming_agent.rs` - Advanced agent with temporal context
Run examples with:
```bash
OPENAI_API_KEY=sk-... cargo run --example ai_agent
```
## Usage as a CLI
```bash
# With OpenAI or any endpoint requiring auth
OPENAI_API_KEY=sk-... cargo run -- vtt.config.json
# With a local server that does not require auth
cargo run -- vtt.config.json
```
- Omit the CLI argument to let the tool load `vtt.config.json` from the current directory if it exists, otherwise it runs with defaults.
- Transcripts are printed live and, when `out_file` is set, appended to that file in addition to the console output.
## Features
- **Real-time transcription**: Continuously captures and transcribes audio
- **Event-driven API**: React to transcriptions as they happen
- **Configurable chunking**: Adjust audio chunk duration for your needs
- **OpenAI compatible**: Works with OpenAI Whisper and compatible APIs
- **Async/await**: Built on Tokio for efficient async processing
- **Type-safe**: Strongly typed events and configuration