scriptrs
Work in progress
scriptrs is early and intentionally narrow right now:
- macOS only
- Apple CoreML only
- Parakeet TDT v2 only
- no CUDA
- no non-macOS backend yet
Rust transcription with native CoreML Parakeet v2 inference.
The base crate exposes a single-chunk TranscriptionPipeline. Fast long-audio chunking lives behind the long-form feature via LongFormTranscriptionPipeline. VAD-backed speech region planning is an additional long-form-vad feature.
Current scope
- Base pipeline for short audio
- Optional fast long-form pipeline with overlap chunking
- Optional VAD-backed long-form region planning
- Native CoreML inference on macOS
- Hugging Face download support with optional local model loading
What it does not do yet
- Linux or Windows support
- CUDA support
- Other ASR models
- Streaming transcription
- Stable public guarantees around model layout or long-form behavior
Install
[]
= "0.1.0"
For fast long-form transcription:
[]
= { = "0.1.0", = ["long-form"] }
For VAD-backed long-form transcription:
[]
= { = "0.1.0", = ["long-form-vad"] }
Model downloads
With the default online feature, scriptrs can resolve models automatically:
- it downloads the runtime bundle from
avencera/scriptrs-models
You can override either side of that:
SCRIPTRS_MODELS_DIR=/path/to/modelsforces a local bundleSCRIPTRS_MODELS_REPO=owner/repoforces a specific Hugging Face model repo layout
Local model layout
If you want to use from_dir(...) or SCRIPTRS_MODELS_DIR, the local bundle should look like this:
models/
parakeet-v2/
encoder.mlmodelc/
decoder.mlmodelc/
joint-decision.mlmodelc/
vocab.txt
With long-form-vad, add:
models/
vad/
silero-vad.mlmodelc/
Usage
Short audio
Use the base pipeline when your audio already fits in a single Parakeet chunk.
With the default online feature, from_pretrained() is the intended path:
use TranscriptionPipeline;
If the input is too long for the base pipeline, it returns AudioTooLong.
If you want to use a local bundle instead:
use TranscriptionPipeline;
Long audio
Enable long-form if you want scriptrs to own long-audio chunking internally and you care most about speed on clean, dense speech.
use LongFormTranscriptionPipeline;
LongFormConfig defaults to the fast overlap-chunking path with 4 workers. You can tune the worker count when you want less or more parallelism:
use ;
Enable long-form-vad when you want VAD-backed speech region planning for sparse speech, long silences, or recordings with a lot of non-speech audio:
use ;
Example
A small WAV example is included:
The example expects mono 16kHz WAV input.
Notes
- The public API is still moving
scriptrscurrently targets the exact file layout and model I/O shipped inavencera/scriptrs-models; if you swap in a different CoreML Parakeet export, you may need runtime code changes- Use
long-formfor the fastest path on clean, dense speech - Add
long-form-vadwhen you need better robustness on sparse-speech or non-speech-heavy recordings