Expand description
Rust transcription with native CoreML Parakeet v2 inference
scriptrs is currently a narrow macOS-first transcription crate:
- macOS only
- CoreML only
- Parakeet TDT v2 only
- mono 16kHz
&[f32]input
The base crate exposes a single-chunk TranscriptionPipeline. Fast
long-audio chunking and overlap merging are available behind the
long-form feature via [LongFormTranscriptionPipeline]. VAD-backed speech
region planning is an additional long-form-vad feature.
§Choosing a pipeline
Use TranscriptionPipeline when your audio already fits in one Parakeet
window. If the input is too long, it returns TranscriptionError::AudioTooLong.
Use [LongFormTranscriptionPipeline] with the long-form feature when you
want scriptrs to own long-audio chunking internally. This default path is
tuned for speed and works well on dense, mostly continuous speech.
Add long-form-vad when you need VAD-backed speech region planning for
sparse speech, long silences, or recordings with a lot of non-speech audio.
§Model loading
With the default online feature, TranscriptionPipeline::from_pretrained
and [LongFormTranscriptionPipeline::from_pretrained] download the runtime
bundle from avencera/scriptrs-models on Hugging Face.
If you already manage models yourself, use TranscriptionPipeline::from_dir
or [LongFormTranscriptionPipeline::from_dir] with a local bundle layout:
models/
parakeet-v2/
encoder.mlmodelc/
decoder.mlmodelc/
joint-decision.mlmodelc/
vocab.txtWith long-form-vad, add:
models/
vad/
silero-vad.mlmodelc/You can also override model resolution with environment variables:
SCRIPTRS_MODELS_DIR=/path/to/modelsSCRIPTRS_MODELS_REPO=owner/repo
§Input requirements
scriptrs expects mono 16kHz audio samples as &[f32]. File decoding stays
outside the core library on purpose.
§Examples
Single-chunk transcription:
use scriptrs::TranscriptionPipeline;
let audio = load_audio();
let pipeline = TranscriptionPipeline::from_pretrained()?;
let result = pipeline.run(&audio)?;
println!("{}", result.text);Long-form transcription:
use scriptrs::LongFormTranscriptionPipeline;
let audio = load_audio();
let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
let result = pipeline.run(&audio)?;
println!("{}", result.text);VAD-backed long-form transcription:
use scriptrs::{LongFormConfig, LongFormMode, LongFormTranscriptionPipeline};
let audio = load_audio();
let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
let config = LongFormConfig {
mode: LongFormMode::Vad,
..LongFormConfig::default()
};
let result = pipeline.run_with_config(&audio, &config)?;
println!("{}", result.text);Structs§
- Model
Bundle - Resolved model paths for
scriptrs - Model
Manager - Downloads and caches model bundles from Hugging Face
- Timed
Token - A single token with timing metadata
- Transcript
Chunk - A text chunk in the final transcript
- Transcription
Config - Configuration for single-chunk transcription
- Transcription
Pipeline - Single-chunk Parakeet v2 transcription pipeline
- Transcription
Result - Final transcription output
Enums§
- Transcription
Error - Errors returned by
scriptrs