scribble 0.2.0

High-level Rust API for audio transcription using Whisper
Documentation

Scribble   Build Status Latest Version

Scribble is a fast, lightweight transcription engine written in Rust, with a built-in Whisper backend and a backend trait for custom implementations.

billboard

Scribble will demux/decode audio or video containers (MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), downmix to mono, and resample to 16 kHz — no preprocessing required.

Goals

  • Provide a clean, idiomatic Rust API for audio transcription
  • Support multiple output formats (JSON, VTT, plain text, etc.)
  • Work equally well as a CLI tool or embedded service
  • Be streaming-first: designed to support incremental, chunk-based transcription pipelines (live audio, long-running streams, and low-latency workflows)
  • Enable composable pipelines: VAD → transcription → encoding, with clear extension points for streaming and real-time use cases
  • Keep the core simple, explicit, and easy to extend

Scribble is built with streaming and real-time transcription in mind, even when operating on static files today.

Installation

Clone the repository and build the binaries:

cargo build --release

This will produce the following binaries:

  • scribble-cli — transcribe audio/video (decodes + normalizes to mono 16 kHz)
  • model-downloader — download Whisper and VAD models

model-downloader

model-downloader is a small helper CLI for downloading known-good Whisper and Whisper-VAD models into a local directory.

List available models

cargo run --bin model-downloader -- --list

Example output:

Whisper models:
  - tiny
  - base.en
  - large-v3-turbo
  - large-v3-turbo-q8_0
  ...

VAD models:
  - silero-v5.1.2
  - silero-v6.2.0

Download a model

cargo run --bin model-downloader -- --name large-v3-turbo

By default, models are downloaded into ./models.

Download into a custom directory

cargo run --bin model-downloader -- \
  --name silero-v6.2.0 \
  --dir /opt/scribble/models

Downloads are performed safely:

  • written to *.part
  • fsynced
  • atomically renamed into place

scribble-cli

scribble-cli is the main transcription CLI.

It accepts audio or video containers and normalizes them to Whisper’s required mono 16 kHz internally. Provide:

  • an input media path (e.g. MP4, MP3, WAV, FLAC, OGG, WebM, MKV) or - to stream from stdin
  • a Whisper model
  • a Whisper-VAD model (used when --enable-vad is set)

Basic transcription (VTT output)

cargo run --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.mp4

Output is written to stdout in WebVTT format by default.

JSON output

cargo run --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type json

Enable voice activity detection (VAD)

cargo run --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --enable-vad \
  --input ./input.wav

When VAD is enabled:

  • non-speech regions are suppressed
  • if no speech is detected, no output is produced

Specify language explicitly

cargo run --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --language en

If --language is omitted, Whisper will auto-detect.

Write output to a file

cargo run --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type vtt \
  > transcript.vtt

Library usage

Scribble is also designed to be embedded as a library.

High-level usage looks like:

use scribble::{opts::Opts, output_type::OutputType, scribble::Scribble};
use std::fs::File;

let mut scribble = Scribble::new(
    "./models/ggml-large-v3-turbo.bin",
    "./models/ggml-silero-v6.2.0.bin",
)?;

let mut input = File::open("audio.wav")?;
let mut output = Vec::new();

let opts = Opts {
    enable_translate_to_english: false,
    enable_voice_activity_detection: true,
    language: None,
    output_type: OutputType::Json,
    incremental_min_window_seconds: 1,
};

scribble.transcribe(&mut input, &mut output, &opts)?;

let json = String::from_utf8(output)?;
println!("{json}");

TODOs

  • Expand testing (goal of 80%+ test coverage)
  • Update VAD to utilize streaming approach
  • Implement the webserver
  • Streaming / incremental transcription support

Status

Scribble is under active development. The API is not yet stable, but the foundations are in place and evolving quickly.

License

MIT