Skip to main content

Crate scriptrs

Crate scriptrs 

Source
Expand description

Rust transcription with native CoreML Parakeet v2 inference

scriptrs is currently a narrow macOS-first transcription crate:

  • macOS only
  • CoreML only
  • Parakeet TDT v2 only
  • mono 16kHz &[f32] input

The base crate exposes a single-chunk TranscriptionPipeline. Fast long-audio chunking and overlap merging are available behind the long-form feature via [LongFormTranscriptionPipeline]. VAD-backed speech region planning is an additional long-form-vad feature.

§Choosing a pipeline

Use TranscriptionPipeline when your audio already fits in one Parakeet window. If the input is too long, it returns TranscriptionError::AudioTooLong.

Use [LongFormTranscriptionPipeline] with the long-form feature when you want scriptrs to own long-audio chunking internally. This default path is tuned for speed and works well on dense, mostly continuous speech.

Add long-form-vad when you need VAD-backed speech region planning for sparse speech, long silences, or recordings with a lot of non-speech audio.

§Model loading

With the default online feature, TranscriptionPipeline::from_pretrained and [LongFormTranscriptionPipeline::from_pretrained] download the runtime bundle from avencera/scriptrs-models on Hugging Face.

If you already manage models yourself, use TranscriptionPipeline::from_dir or [LongFormTranscriptionPipeline::from_dir] with a local bundle layout:

models/
  parakeet-v2/
    encoder.mlmodelc/
    decoder.mlmodelc/
    joint-decision.mlmodelc/
    vocab.txt

With long-form-vad, add:

models/
  vad/
    silero-vad.mlmodelc/

You can also override model resolution with environment variables:

  • SCRIPTRS_MODELS_DIR=/path/to/models
  • SCRIPTRS_MODELS_REPO=owner/repo

§Input requirements

scriptrs expects mono 16kHz audio samples as &[f32]. File decoding stays outside the core library on purpose.

§Examples

Single-chunk transcription:

use scriptrs::TranscriptionPipeline;

let audio = load_audio();
let pipeline = TranscriptionPipeline::from_pretrained()?;
let result = pipeline.run(&audio)?;

println!("{}", result.text);

Long-form transcription:

use scriptrs::LongFormTranscriptionPipeline;

let audio = load_audio();
let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
let result = pipeline.run(&audio)?;

println!("{}", result.text);

VAD-backed long-form transcription:

use scriptrs::{LongFormConfig, LongFormMode, LongFormTranscriptionPipeline};

let audio = load_audio();
let pipeline = LongFormTranscriptionPipeline::from_pretrained()?;
let config = LongFormConfig {
    mode: LongFormMode::Vad,
    ..LongFormConfig::default()
};
let result = pipeline.run_with_config(&audio, &config)?;

println!("{}", result.text);

Structs§

ModelBundle
Resolved model paths for scriptrs
ModelManager
Downloads and caches model bundles from Hugging Face
TimedToken
A single token with timing metadata
TranscriptChunk
A text chunk in the final transcript
TranscriptionConfig
Configuration for single-chunk transcription
TranscriptionPipeline
Single-chunk Parakeet v2 transcription pipeline
TranscriptionResult
Final transcription output

Enums§

TranscriptionError
Errors returned by scriptrs