vtt-rs 0.1.1

Library and CLI for streaming microphone input to OpenAI compatible transcription APIs
Documentation

vtt-rs

A Rust library and command-line utility for real-time audio transcription using OpenAI-compatible APIs. Perfect for adding situational awareness to AI agents through speech recognition.

CI Documentation API Docs

Documentation

Or build locally:

cargo doc --no-deps --open

Configuration

  • Set OPENAI_API_KEY in your environment before running anything.

  • The binary expects an optional JSON configuration file (default vtt.config.json in the current directory, or pass an alternate path as the first argument).

  • Supported keys (all optional; sensible defaults exist):

    {
      "chunk_duration_secs": 5,
      "model": "whisper-1",
      "endpoint": "https://api.openai.com/v1/audio/transcriptions",
      "out_file": "transcripts.log"
    }
    
    • chunk_duration_secs: duration of each captured audio block that is transcribed.
    • model: which OpenAI transcription model to hit.
    • endpoint: custom transcription endpoint for e.g. a proxy service.
    • out_file: path to append every transcription (chunk ID + contents).

Usage as a Library

Add vtt-rs to your Cargo.toml:

[dependencies]
vtt-rs = { git = "https://github.com/geoffsee/vtt-rs" }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }

Basic Example

use vtt_rs::{Config, TranscriptionEvent, TranscriptionService};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = Config::default();
    let api_key = std::env::var("OPENAI_API_KEY")?;

    let mut service = TranscriptionService::new(config, api_key)?;
    let (mut receiver, _stream) = service.start().await?;

    // Process transcription events
    while let Some(event) = receiver.recv().await {
        match event {
            TranscriptionEvent::Transcription { chunk_id, text } => {
                println!("Heard: {}", text);
                // Feed this to your AI agent for situational awareness
            }
            TranscriptionEvent::Error { chunk_id, error } => {
                eprintln!("Error: {}", error);
            }
        }
    }

    Ok(())
}

AI Agent Integration

The library is designed to give AI agents "ears" - the ability to perceive and respond to their audio environment. Check out the examples:

  • examples/ai_agent.rs - Basic AI agent with audio awareness
  • examples/streaming_agent.rs - Advanced agent with temporal context

Run examples with:

OPENAI_API_KEY=sk-... cargo run --example ai_agent

Usage as a CLI

OPENAI_API_KEY=sk-... cargo run -- vtt.config.json
  • Omit the CLI argument to let the tool load vtt.config.json from the current directory if it exists, otherwise it runs with defaults.
  • Transcripts are printed live and, when out_file is set, appended to that file in addition to the console output.

Features

  • Real-time transcription: Continuously captures and transcribes audio
  • Event-driven API: React to transcriptions as they happen
  • Configurable chunking: Adjust audio chunk duration for your needs
  • OpenAI compatible: Works with OpenAI Whisper and compatible APIs
  • Async/await: Built on Tokio for efficient async processing
  • Type-safe: Strongly typed events and configuration