Scribble

Scribble is a fast, lightweight transcription engine written in Rust, with a built-in Whisper backend and a backend trait for custom implementations.

Scribble will demux/decode audio or video containers (MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), downmix to mono, and resample to 16 kHz — no preprocessing required.
Demo
Project goals
- Provide a clean, idiomatic Rust API for audio transcription
- Support multiple output formats (JSON, VTT, plain text, etc.)
- Work equally well as a CLI tool or embedded library
- Be streaming-first: designed to support incremental, chunk-based transcription pipelines (live audio, long-running streams, and low-latency workflows)
- Enable composable pipelines: VAD → transcription → encoding, with clear extension points for streaming and real-time use cases
- Keep the core simple, explicit, and easy to extend
Scribble is built with streaming and real-time transcription in mind, even when operating on static files today.
Installation
Rust toolchain
Scribble targets Rust stable (tracked via rust-toolchain.toml).
Clone the repository and build the binaries:
Or build a single binary to a target directory:
This will produce the following binaries:
scribble-cli— transcribe audio/video (decodes + normalizes to mono 16 kHz)scribble-server— HTTP server for transcriptionmodel-downloader— download Whisper and VAD models
model-downloader
model-downloader is a small helper CLI for downloading known-good Whisper and Whisper-VAD models into a local directory.
List available models
Example output:
Whisper models:
- tiny
- base.en
- large-v3-turbo
- large-v3-turbo-q8_0
...
VAD models:
- silero-v5.1.2
- silero-v6.2.0
Download a model
By default, models are downloaded into ./models.
Download into a custom directory
Downloads are performed safely:
- written to
*.part - fsynced
- atomically renamed into place
scribble-cli
scribble-cli is the main transcription CLI.
It accepts audio or video containers and normalizes them to Whisper’s required mono 16 kHz internally. Provide:
- an input media path (e.g. MP4, MP3, WAV, FLAC, OGG, WebM, MKV) or
-to stream from stdin - a Whisper model
- a Whisper-VAD model (used when
--enable-vadis set)
Basic transcription (VTT output)
Output is written to stdout in WebVTT format by default.
Stream a live URL into scribble-cli (via ffmpeg)
If you have a live audio stream URL (MP3/AAC/etc.), you can decode it to Whisper-friendly WAV and pipe it into scribble-cli via stdin:
|
Stream a Twitch channel into scribble-cli (via streamlink + ffmpeg)
If you have streamlink installed, you can pull a Twitch stream to stdout and feed it through ffmpeg:
| |
scribble-server
scribble-server is a long-running HTTP server that loads models once and accepts transcription requests over HTTP.
Start the server
Transcribe via HTTP (multipart upload)
For JSON output:
Example using all query params:
Prometheus metrics
scribble-server exposes Prometheus metrics at GET /metrics.
Key metrics:
scribble_http_requests_total(labels:status)scribble_http_request_duration_seconds(labels:status)scribble_http_in_flight_requests
Logging
All binaries emit structured JSON logs to stderr.
- Default level:
error - Override with
SCRIBBLE_LOG(e.g.SCRIBBLE_LOG=info)
JSON output
Enable voice activity detection (VAD)
When VAD is enabled:
- non-speech regions are suppressed
- if no speech is detected, no output is produced
Specify language explicitly
If --language is omitted, Whisper will auto-detect.
Write output to a file
Library usage
Scribble is also designed to be embedded as a library.
High-level usage looks like:
use ;
use File;
let mut scribble = new?;
let mut input = open?;
let mut output = Vecnew;
let opts = Opts ;
scribble.transcribe?;
let json = Stringfrom_utf8?;
println!;
Goals
- Make VAD streaming-capable
- Support streaming and incremental transcription
- Select the primary audio track in multi-track video containers
- Implement a web server
- Add Prometheus metrics endpoint
- Add structured logs (tracing)
- Expand test coverage to 80%+
Coverage
This project uses cargo-llvm-cov for coverage locally and in CI.
One-time setup:
Run coverage locally:
# Print a summary to stdout
# Generate an HTML report (writes to ./target/llvm-cov/html)
Status
Scribble is under active development. The API is not yet stable, but the foundations are in place and evolving quickly.
Release notes live in CHANGELOG.md (and GitHub Releases).
Contributing
See STYLEGUIDE.md for code style, verification conventions, and repo-level checklists.
License
MIT
