scribble 0.5.1

High-level Rust API for audio transcription using Whisper
Documentation

Scribble   Build Status Coverage Latest Version Docs Badge Report Card Badge MIT licensed

Scribble is a fast, lightweight transcription engine written in Rust, with a built-in Whisper backend and a backend trait for custom implementations.

banner

Scribble will demux/decode audio or video containers (MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), downmix to mono, and resample to 16 kHz — no preprocessing required.

Demo

Project goals

  • Provide a clean, idiomatic Rust API for audio transcription
  • Support multiple output formats (JSON, VTT, plain text, etc.)
  • Work equally well as a CLI tool or embedded library
  • Be streaming-first: designed to support incremental, chunk-based transcription pipelines (live audio, long-running streams, and low-latency workflows)
  • Enable composable pipelines: VAD → transcription → encoding, with clear extension points for streaming and real-time use cases
  • Keep the core simple, explicit, and easy to extend

Scribble is built with streaming and real-time transcription in mind, even when operating on static files today.

Installation

Rust toolchain

Scribble targets Rust stable (tracked via rust-toolchain.toml).

Clone the repository and build the binaries:

cargo build --release --all-features

Or build a single binary to a target directory:

./scripts/build.sh scribble-cli ./dist

This will produce the following binaries:

  • scribble-cli — transcribe audio/video (decodes + normalizes to mono 16 kHz)
  • scribble-server — HTTP server for transcription
  • model-downloader — download Whisper and VAD models

model-downloader

model-downloader is a small helper CLI for downloading known-good Whisper and Whisper-VAD models into a local directory.

List available models

cargo run --features bin-model-downloader --bin model-downloader -- --list

Example output:

Whisper models:
  - tiny
  - base.en
  - large-v3-turbo
  - large-v3-turbo-q8_0
  ...

VAD models:
  - silero-v5.1.2
  - silero-v6.2.0

Download a model

cargo run --features bin-model-downloader --bin model-downloader -- --name large-v3-turbo

By default, models are downloaded into ./models.

Download into a custom directory

cargo run --features bin-model-downloader --bin model-downloader -- \
  --name silero-v6.2.0 \
  --dir /opt/scribble/models

Downloads are performed safely:

  • written to *.part
  • fsynced
  • atomically renamed into place

scribble-cli

scribble-cli is the main transcription CLI.

It accepts audio or video containers and normalizes them to Whisper’s required mono 16 kHz internally. Provide:

  • an input media path (e.g. MP4, MP3, WAV, FLAC, OGG, WebM, MKV) or - to stream from stdin
  • a Whisper model
  • a Whisper-VAD model (used when --enable-vad is set)

Basic transcription (VTT output)

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.mp4

Output is written to stdout in WebVTT format by default.

Stream a live URL into scribble-cli (via ffmpeg)

If you have a live audio stream URL (MP3/AAC/etc.), you can decode it to Whisper-friendly WAV and pipe it into scribble-cli via stdin:

ffmpeg -re -loglevel error -nostats \
  -i "https://stream.example.com/live.mp3?session-id=REDACTED" \
  -f wav -ac 1 -ar 16000 - \
| scribble-cli \
    --model ./models/ggml-tiny.bin \
    --vad-model ./models/ggml-silero-v6.2.0.bin \
    --enable-vad \
    --input -

Stream a Twitch channel into scribble-cli (via streamlink + ffmpeg)

If you have streamlink installed, you can pull a Twitch stream to stdout and feed it through ffmpeg:

streamlink --stdout https://www.twitch.tv/dougdoug best \
| ffmpeg -hide_banner -loglevel error -i pipe:0 -vn -ac 1 -ar 16000 -f wav pipe:1 \
| scribble-cli \
    --model ./models/ggml-tiny.bin \
    --vad-model ./models/ggml-silero-v6.2.0.bin \
    --enable-vad \
    --input -

scribble-server

scribble-server is a long-running HTTP server that loads models once and accepts transcription requests over HTTP.

Start the server

cargo run --features bin-scribble-server --bin scribble-server -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --host 127.0.0.1 \
  --port 8080

Transcribe via HTTP (multipart upload)

curl -sS --data-binary @./input.mp4 \
  "http://127.0.0.1:8080/transcribe?output=vtt" \
  > transcript.vtt

For JSON output:

curl -sS --data-binary @./input.wav \
  "http://127.0.0.1:8080/transcribe?output=json" \
  > transcript.json

Example using all query params:

curl -sS --data-binary @./input.mp4 \
  "http://127.0.0.1:8080/transcribe?output=json&output_type=json&model_key=ggml-large-v3-turbo.bin&enable_vad=true&translate_to_english=true&language=en" \
  > transcript.json

Prometheus metrics

scribble-server exposes Prometheus metrics at GET /metrics.

curl -sS "http://127.0.0.1:8080/metrics"

Key metrics:

  • scribble_http_requests_total (labels: status)
  • scribble_http_request_duration_seconds (labels: status)
  • scribble_http_in_flight_requests

Logging

All binaries emit structured JSON logs to stderr.

  • Default level: error
  • Override with SCRIBBLE_LOG (e.g. SCRIBBLE_LOG=info)

JSON output

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type json

Enable voice activity detection (VAD)

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --enable-vad \
  --input ./input.wav

When VAD is enabled:

  • non-speech regions are suppressed
  • if no speech is detected, no output is produced

Specify language explicitly

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --language en

If --language is omitted, Whisper will auto-detect.

Write output to a file

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type vtt \
  > transcript.vtt

Library usage

Scribble is also designed to be embedded as a library.

High-level usage looks like:

use scribble::{Opts, OutputType, Scribble};
use std::fs::File;

let mut scribble = Scribble::new(
    ["./models/ggml-large-v3-turbo.bin"],
    "./models/ggml-silero-v6.2.0.bin",
)?;

let mut input = File::open("audio.wav")?;
let mut output = Vec::new();

let opts = Opts {
    model_key: None,
    enable_translate_to_english: false,
    enable_voice_activity_detection: true,
    language: None,
    output_type: OutputType::Json,
    incremental_min_window_seconds: 1,
};

scribble.transcribe(&mut input, &mut output, &opts)?;

let json = String::from_utf8(output)?;
println!("{json}");

Goals

  • Make VAD streaming-capable
  • Support streaming and incremental transcription
  • Select the primary audio track in multi-track video containers
  • Implement a web server
  • Add Prometheus metrics endpoint
  • Add structured logs (tracing)
  • Expand test coverage to 80%+

Coverage

This project uses cargo-llvm-cov for coverage locally and in CI.

One-time setup:

rustup component add llvm-tools-preview
cargo install cargo-llvm-cov

Run coverage locally:

# Print a summary to stdout
cargo llvm-cov --all-features --all-targets

# Generate an HTML report (writes to ./target/llvm-cov/html)
cargo llvm-cov --all-features --all-targets --html

Status

Scribble is under active development. The API is not yet stable, but the foundations are in place and evolving quickly.

Release notes live in CHANGELOG.md (and GitHub Releases).

Contributing

See STYLEGUIDE.md for code style, verification conventions, and repo-level checklists.

License

MIT

footer