Transcribe any audio locally on Apple Silicon in seconds, not minutes.

whisper-macos-cli

Versão em Português Brasileiro

What is it

Local audio transcription CLI for macOS Apple Silicon
Powered by whisper.cpp with Metal GPU acceleration
Strict stdin/stdout JSON contract for AI agents
Zero telemetry, zero cloud calls, zero setup beyond cargo install

Why

Audio transcription as a service locks your data with a third party
Whisper models in Python are 10x slower and 5x heavier than whisper.cpp
Most CLIs treat stdout as a dumping ground; we treat it as a contract

Superpowers

Discoverable: whisper-macos-cli commands emits the full command tree
Self-describing: whisper-macos-cli schema returns the full JSON Schema
Traceable: every output carries a UUID v7 correlation_id
Versioned: every output carries a schema_version for safe evolution
Resilient: SIGINT and SIGTERM trigger clean shutdown; double Ctrl+C forces exit
Safe: model downloads are SHA256-verified and TLS-enforced
Composable: behaves like any other Unix tool — pipes, NDJSON, jaq, xargs

Quick Start

cargo install whisper-macos-cli
whisper-macos-cli models download
whisper-macos-cli transcribe voice.ogg

The first transcription is slower because the model loads into unified memory. Subsequent transcriptions reuse the cached context.

Installation

Prerequisites

macOS 13 or later
Apple Silicon (M1, M2, M3, M4)
Xcode Command Line Tools: xcode-select --install
cmake: brew install cmake
Rust 1.88 or later: rustup install stable

From crates.io

cargo install whisper-macos-cli

From source

git clone https://github.com/daniloaguiarbr/whisper-macos-cli
cd whisper-macos-cli
cargo build --release
./target/release/whisper-macos-cli --version

Pre-built binaries

Download the appropriate binary for your architecture from the GitHub Releases page. Verify the SHA256 hash against SHA256SUMS before installing.

Usage

# Single file
whisper-macos-cli transcribe recording.ogg

# From stdin
cat audio.mp3 | whisper-macos-cli transcribe

# Batch as NDJSON
whisper-macos-cli transcribe *.ogg --ndjson --concurrency 4

# Force a language
whisper-macos-cli transcribe --language pt audio.wav

# Use a smaller model for speed
whisper-macos-cli transcribe --model small audio.wav

# Get JSON Schema for downstream validation
whisper-macos-cli schema > schema.json
whisper-macos-cli transcribe audio.ogg | jsonschema -i schema.json

Commands

Subcommand	Purpose
transcribe	Transcribe one or more audio files
models	Download, list, locate, or remove models
doctor	Diagnose environment and dependencies
schema	Emit the full JSON Schema envelope
config	Emit current effective configuration
commands	Emit the full command tree as JSON
init	Generate SKILL.md and AGENTS.md scaffold
licenses	Print third-party license attribution
completions	Generate shell completions
resume	Resume a previous batch (v0.1: no-op)

Run whisper-macos-cli commands --format json to see the full tree.

Environment Variables

Variable	Effect
WHISPER_MODEL	Override default model
WHISPER_LANGUAGE	Override default language
NO_COLOR	Disable colored output
CI	Disable interactive prompts (1, true, yes)
RUST_LOG	Override tracing log level filter
SOURCE_DATE_EPOCH	Unix timestamp for reproducible builds
NO_INPUT	Override --no-input flag
QUIET	Override --quiet flag

Integration Patterns

# Pipe to jaq for selective extraction
whisper-macos-cli transcribe audio.ogg | jaq -r '.text'

# Batch via fd and xargs
fd -e ogg . /path/to/audios/ \
  | xargs whisper-macos-cli transcribe --ndjson --concurrency 4

# Stream from HTTP
xh -d https://example.com/audio.ogg | whisper-macos-cli transcribe

# Validate against schema in CI
whisper-macos-cli transcribe audio.ogg \
  | jaq -e "has(\"correlation_id\") and has(\"schema_version\")"

Performance

First transcription (cold start, large-v3): 2-5 seconds warmup
Subsequent transcriptions: roughly real-time on M2 Pro
Memory: large-v3 requires ~3 GB of unified memory during inference
Concurrency: scales linearly up to --concurrency 8 on M1 Pro

Memory Requirements

Model	Peak Memory
tiny	~300 MB
base	~500 MB
small	~1 GB
medium	~3 GB
large-v3	~3.5 GB

Whisper.cpp unloads the model when the process exits.

Troubleshooting FAQ

See docs/TROUBLESHOOTING.md for the complete guide, including:

exit code 64 (no input)
exit code 65 (invalid audio)
exit code 66 (file not found)
exit code 69 (download failed)
exit code 70 (inference failed)
exit code 74 (I/O error)
exit code 78 (model not found)

Documentation

AGENTS.md — Agent integration guide
CHANGELOG.md — Release history
CONTRIBUTING.md — How to contribute
SECURITY.md — Report vulnerabilities
CODE_OF_CONDUCT.md — Community standards
PRIVACY.md — Data handling policy
INTEGRATIONS.md — Supported agents and platforms
llms.txt — LLM-friendly summary
llms-full.txt — LLM-friendly full reference
docs/HOW_TO_USE.md — Advanced recipes
docs/AGENTS.md — Author guide for agent integrators
docs/COOKBOOK.md — Twenty worked examples
docs/CROSS_PLATFORM.md — Platform matrix
docs/MIGRATION.md — Version migration
docs/TESTING.md — Test execution guide
docs/schemas/ — Machine-readable schemas
skill/ — Agent skill descriptors

Contributing

See CONTRIBUTING.md for the workflow. Every pull request must pass the 8-item checklist before merge.

Security

Report vulnerabilities via GitHub Security Advisories at https://github.com/daniloaguiarbr/whisper-macos-cli/security/advisories/new — not as public issues. SLA is 72 hours for initial triage.

Changelog

See CHANGELOG.md for the complete release history. The current development version is documented under ## [Unreleased].

License

MIT — see LICENSE. Third-party notices in NOTICE and via whisper-macos-cli licenses.

whisper-macos-cli 0.1.0