native-whisperx
native-whisperx is a Rust-first WhisperX parity workflow repository built
from the published moritzbrantner-* Rust crates.
This repository owns application composition:
- CLI commands
- workflow configuration
- output file writing
- parity checks against Python WhisperX
- setup documentation for local model bundles and Hugging Face cache resolution
It does not own the reusable ASR, transcript, model-runtime, alignment, or
speaker primitives. Those remain in
rust-packages and are
consumed here as published crates.
Crate Layout
crates/native-whisperx # reusable workflow library
crates/native-whisperx-cli # native-whisperx CLI binary
Cargo Install
The first Cargo release keeps the two-crate shape. Publish
native-whisperx first, then publish native-whisperx-cli. The CLI crate is
the Cargo install package, and it installs the native-whisperx terminal
command:
These installed-binary smoke commands are no-resource offline checks. They do not transcribe media, download models, use CUDA, call Python WhisperX, read Hugging Face credentials, or require a local smoke media root.
Installed transcription commands use the same local model bundle, cache, and
resource requirements as repository examples. Install success only proves that
the CLI package and native-whisperx command are available; transcription
readiness still depends on the requested Whisper, alignment, diarization, VAD,
translation, CUDA, Python WhisperX compatibility, and gated Hugging Face
resources. Delegated Feature paths remain delegated until replaced by explicit
Rust-Native Parity work.
Maintainer release gates and the manual library-first, CLI-second publish
checklist are in docs/publish-plan.md.
Workflow
media or samples
-> moritzbrantner-audio-analysis-transcription
-> moritzbrantner-text-transcripts::TranscriptionContract
-> default wav2vec2 alignment unless disabled
-> optional speaker diarization
-> optional segment-level post-ASR translation
-> WhisperX JSON, Native JSON, SRT, WebVTT, or plain text outputs
Feature Flags
| Feature | Purpose |
|---|---|
native |
Native Candle Whisper and wav2vec2 alignment composition. Enabled by default. |
translation |
Helsinki-NLP OPUS-MT/Marian post-ASR segment translation. Enabled by native. |
cuda |
CUDA-backed Candle execution. Opt in when a local CUDA toolchain is available. |
media-decode |
Opt-in non-WAV media/container decode through the audio I/O crate. |
diarization |
Heuristic speaker diarization composition. |
onnx-diarization |
Explicit ONNX speaker embedding diarization path. |
pyannote-diarization |
Explicit native pyannote community diarization bundle path. |
whisperx-compat |
External Python WhisperX command compatibility and parity checks. |
Commands
Import existing WhisperX JSON:
Inspect the request shape for local model bundles:
Run native transcription with an explicit Whisper bundle:
Run native transcription from a locally cached Hugging Face Whisper model:
Transcribe multiple files by passing concrete paths or app-expanded wildcard
patterns. Relative and absolute paths are accepted. Quoted patterns such as
'audio/*.wav' are expanded by native-whisperx before transcription, so they do
not depend on shell glob behavior:
When --output-dir is omitted, the default json transcript is written beside
each input file. When --output-dir is supplied, all outputs use that shared
directory and native-whisperx fails before transcription if two inputs would
write the same output basename. --basename is rejected with multiple expanded
inputs.
Run native post-ASR translation:
This path transcribes source-language segments with native Whisper, translates segment text with the configured OPUS-MT Marian model, and preserves segment timings for downstream writers.
Run the ignored manual cache-only native ASR smoke when SMOKE_ROOT contains
the required audio and Hugging Face cache layout:
Detailed setup is in docs/model-bundles.md.
Run native-vs-Python WhisperX parity:
Preflight local parity resources, generate ignored Python WhisperX 3.8.6 goldens, then run the local ASR parity fixture suite:
--format json writes WhisperX-compatible JSON. Use --format native-json
when you need the Rust transcript contract shape.
Alignment is enabled by default. Use --no-align / --no_align to skip it.
Use --whisper-bundle and --alignment-bundle for explicit local bundles.
Without --whisper-bundle, native ASR can resolve supported Whisper models
through the Hugging Face cache or download path; --model-cache-only requires
the files to already exist locally and never downloads.
--translation-model reuses --model-dir and --model-cache-only for
translation model resolution unless --translation-bundle is supplied.
Published-Crate Requirement
This workspace intentionally uses crates.io dependencies so a clean checkout can
resolve and test without a sibling ../rust-packages repository. The published
dependency closure is tracked in docs/publish-plan.md.
During local co-development, keep local Cargo patches outside commits. One
option is to put overrides in a local Cargo config and keep that config
untracked, for example by adding .cargo/ to .git/info/exclude before
creating .cargo/config.toml:
[]
= { = "../rust-packages/crates/runtime/runtime-core" }
= { = "../rust-packages/crates/audio/audio-analysis-speakers" }
= { = "../rust-packages/crates/audio/audio-analysis-transcription" }
= { = "../rust-packages/crates/runtime/model-runtime" }
= { = "../rust-packages/crates/text/text-model-runtime" }
= { = "../rust-packages/crates/text/text-transcripts" }
= { = "../rust-packages/crates/video/video-analysis-core" }
Add any transitive unpublished crates to the same local override only for local validation. Do not commit patch entries to this repository.
Pull Request CI
The default pull request workflow runs offline Rust gates on GitHub-hosted Ubuntu runners:
The feature-matrix rows are compile-only gates. They cover the external WhisperX compatibility bridge, media decode, heuristic diarization, Silero VAD, ONNX diarization, and the combined offline optional feature set without running model inference.
These checks do not require local model bundles, Python WhisperX, CUDA devices,
Hugging Face tokens, ONNX Runtime dynamic-library configuration, or self-hosted
parity resources. Real-resource parity checks remain in the opt-in
parity-fixtures workflow.
Runtime-only or intentionally constrained combinations stay outside pull request CI:
| Feature or path | Gate | Reason and expected failure mode |
|---|---|---|
silero-vad runtime transcription |
parity-preflight / parity-fixtures |
Compile is covered in PR CI, but execution requires ORT_DYLIB_PATH and a local Silero ONNX bundle. Missing resources fail preflight before model execution. |
onnx-diarization runtime transcription |
parity-preflight / parity-fixtures |
Compile is covered in PR CI, but execution requires ONNX Runtime plus local diarization model artifacts. Missing resources fail preflight before model execution. |
pyannote-vad and pyannote-diarization full-resource parity |
parity-fixtures final-full-surface |
Meaningful validation requires local pyannote bundles, expected WhisperX goldens, Python WhisperX resources, and gated Hugging Face access for the delegated reference path. Missing resources are reported by preflight. |
cuda |
manual/report-only throughput ladder | The CUDA feature requires a compatible local CUDA toolchain and device, so it is not a GitHub-hosted pull request gate. |