Skip to main content

Crate sherpa_onnx

Crate sherpa_onnx 

Source
Expand description

Safe Rust bindings for the public sherpa-onnx inference APIs.

This crate wraps the sherpa-onnx C API with RAII-owned Rust types and idiomatic configuration structs. The main feature families are:

§Setup

This crate now links statically by default. If SHERPA_ONNX_LIB_DIR is not set, the build script downloads a matching prebuilt -lib archive from GitHub releases and uses it automatically during the build.

In other words, the default setup for most users is simply:

sherpa-onnx = "1.13.1"

If you want shared libraries instead, disable the default feature and enable shared:

sherpa-onnx = { version = "1.13.1", default-features = false, features = ["shared"] }

For advanced use cases, set SHERPA_ONNX_LIB_DIR to a directory that already contains sherpa-onnx libraries:

export SHERPA_ONNX_LIB_DIR=/path/to/sherpa-onnx/lib

That override works for both static and shared builds.

Shared mode is also intended to work out of the box for normal users:

  • Linux and macOS: the build script adds both absolute and relative rpath entries automatically, and copies the required shared runtime libraries next to Cargo-generated binaries and examples.
  • Windows: the build script copies the required DLLs next to the generated binaries automatically when using shared libraries.

So most users do not need to manually set LD_LIBRARY_PATH or DYLD_LIBRARY_PATH.

Example v1.13.1 archives used by the build script:

Default static archives:

Optional shared archives:

§How the Rust API is organized

Most APIs follow the same pattern:

  1. Start with a *Config value and fill the fields for exactly one model family.
  2. Call create() to construct the runtime object.
  3. Create a stream if the API is stream-based.
  4. Feed audio or text, then fetch results with the provided accessor methods.

All runtime wrappers automatically free their underlying C resources on drop.

§Examples

The repository contains end-to-end Rust examples under rust-api-examples/examples/. Good entry points are:

§Offline recognition example

use sherpa_onnx::{
    OfflineRecognizer, OfflineRecognizerConfig, OfflineSenseVoiceModelConfig, Wave,
};

let wave = Wave::read("./test.wav").expect("read wave");

let mut config = OfflineRecognizerConfig::default();
config.model_config.sense_voice = OfflineSenseVoiceModelConfig {
    model: Some(
        "./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.int8.onnx".into(),
    ),
    language: Some("auto".into()),
    use_itn: true,
};
config.model_config.tokens = Some(
    "./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt".into(),
);

let recognizer = OfflineRecognizer::create(&config).expect("create recognizer");
let stream = recognizer.create_stream();
stream.accept_waveform(wave.sample_rate(), wave.samples());
recognizer.decode(&stream);

let result = stream.get_result().expect("result");
println!("{}", result.text);

§Streaming recognition example

use sherpa_onnx::{OnlineRecognizer, OnlineRecognizerConfig, Wave};

let wave = Wave::read("./test.wav").expect("read wave");

let mut config = OnlineRecognizerConfig::default();
config.model_config.transducer.encoder = Some(
    "./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.int8.onnx".into(),
);
config.model_config.transducer.decoder = Some(
    "./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx".into(),
);
config.model_config.transducer.joiner = Some(
    "./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.int8.onnx".into(),
);
config.model_config.tokens = Some(
    "./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt".into(),
);
config.enable_endpoint = true;
config.decoding_method = Some("greedy_search".into());

let recognizer = OnlineRecognizer::create(&config).expect("create recognizer");
let stream = recognizer.create_stream();
stream.accept_waveform(wave.sample_rate(), wave.samples());
stream.input_finished();
while recognizer.is_ready(&stream) {
    recognizer.decode(&stream);
}

§TTS example

use sherpa_onnx::{OfflineTts, OfflineTtsConfig, OfflineTtsModelConfig, OfflineTtsPocketModelConfig};

let config = OfflineTtsConfig {
    model: OfflineTtsModelConfig {
        pocket: OfflineTtsPocketModelConfig {
            lm_flow: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_flow.int8.onnx".into()),
            lm_main: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_main.int8.onnx".into()),
            encoder: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/encoder.onnx".into()),
            decoder: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/decoder.int8.onnx".into()),
            text_conditioner: Some(
                "./sherpa-onnx-pocket-tts-int8-2026-01-26/text_conditioner.onnx".into(),
            ),
            vocab_json: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/vocab.json".into()),
            token_scores_json: Some(
                "./sherpa-onnx-pocket-tts-int8-2026-01-26/token_scores.json".into(),
            ),
            ..Default::default()
        },
        ..Default::default()
    },
    ..Default::default()
};

let tts = OfflineTts::create(&config).expect("create tts");
println!("{}", tts.sample_rate());

Structs§

AudioEvent
One predicted audio event.
AudioTagging
Offline audio tagger.
AudioTaggingConfig
Top-level configuration for AudioTagging.
AudioTaggingModelConfig
Model-level configuration for audio tagging.
AudioTaggingOfflineStream
Input stream for offline audio tagging.
CircularBuffer
Circular sample buffer used by some VAD workflows.
DenoisedAudio
Denoised samples returned from an offline or online denoiser.
DisplayManager
Stores finalized sentences with timestamps and the current partial hypothesis for terminal UIs.
FastClusteringConfig
Fast clustering options used after segmentation and embedding extraction.
GeneratedAudio
Generated audio returned by OfflineTts::generate_with_config.
GenerationConfig
Per-request generation options for OfflineTts::generate_with_config.
HomophoneReplacerConfig
Optional homophone replacement resources.
KeywordResult
Decoded keyword spotting result for one stream.
KeywordSpotter
Streaming keyword spotter.
KeywordSpotterConfig
Configuration for KeywordSpotter.
LinearResampler
A linear resampler that converts audio from one sample rate to another.
OfflineCanaryModelConfig
Offline Canary model configuration.
OfflineCohereTranscribeModelConfig
Offline Cohere Transcribe model configuration.
OfflineDolphinModelConfig
Offline Dolphin model configuration.
OfflineFireRedAsrCtcModelConfig
Offline FireRed ASR CTC model configuration.
OfflineFireRedAsrModelConfig
Offline FireRed ASR transducer configuration.
OfflineFunASRNanoModelConfig
Offline FunASR Nano model configuration.
OfflineLMConfig
Optional external language model configuration for offline ASR.
OfflineMedAsrCtcModelConfig
Offline MedASR CTC model configuration.
OfflineModelConfig
Aggregate model configuration for offline recognition.
OfflineMoonshineModelConfig
For Moonshine v1, you need 4 models:
OfflineNemoEncDecCtcModelConfig
Offline NeMo CTC model configuration.
OfflineOmnilingualAsrCtcModelConfig
Offline omnilingual CTC model configuration.
OfflineParaformerModelConfig
Offline Paraformer model configuration.
OfflinePunctuation
Offline punctuation restorer.
OfflinePunctuationConfig
Top-level configuration for OfflinePunctuation.
OfflinePunctuationModelConfig
Model configuration for offline punctuation restoration.
OfflineQwen3ASRModelConfig
Offline Qwen3 ASR model configuration.
OfflineRecognizer
Offline speech recognizer.
OfflineRecognizerConfig
Top-level configuration for OfflineRecognizer.
OfflineRecognizerResult
Recognition result returned by OfflineStream::get_result.
OfflineSenseVoiceModelConfig
Offline SenseVoice model configuration.
OfflineSpeakerDiarization
Offline speaker diarizer.
OfflineSpeakerDiarizationConfig
Top-level configuration for OfflineSpeakerDiarization.
OfflineSpeakerDiarizationResult
Result object returned by OfflineSpeakerDiarization::process.
OfflineSpeakerDiarizationSegment
One diarization segment labeled with a speaker index.
OfflineSpeakerSegmentationModelConfig
Segmentation model configuration for diarization.
OfflineSpeakerSegmentationPyannoteModelConfig
Pyannote segmentation model path.
OfflineSpeechDenoiser
Offline speech denoiser.
OfflineSpeechDenoiserConfig
Top-level configuration for OfflineSpeechDenoiser.
OfflineSpeechDenoiserDpdfNetModelConfig
DPDFNet model path for offline denoising.
OfflineSpeechDenoiserGtcrnModelConfig
GTCRN model path for offline denoising.
OfflineSpeechDenoiserModelConfig
Aggregate model configuration for OfflineSpeechDenoiser.
OfflineStream
Input stream used by OfflineRecognizer.
OfflineTdnnModelConfig
Offline TDNN model configuration.
OfflineTransducerModelConfig
Offline transducer model configuration.
OfflineTts
Offline TTS engine.
OfflineTtsConfig
Top-level configuration for OfflineTts.
OfflineTtsKittenModelConfig
Kitten model configuration.
OfflineTtsKokoroModelConfig
Kokoro model configuration.
OfflineTtsMatchaModelConfig
Matcha model configuration.
OfflineTtsModelConfig
Aggregate model configuration for OfflineTts.
OfflineTtsPocketModelConfig
Pocket TTS model configuration.
OfflineTtsSupertonicModelConfig
Supertonic model configuration.
OfflineTtsVitsModelConfig
VITS model configuration.
OfflineTtsZipvoiceModelConfig
ZipVoice model configuration.
OfflineWenetCtcModelConfig
Offline WeNet CTC model configuration.
OfflineWhisperModelConfig
Offline Whisper model configuration.
OfflineZipformerAudioTaggingModelConfig
Zipformer audio tagging model path.
OfflineZipformerCtcModelConfig
Offline Zipformer CTC model configuration.
OnlineCtcFstDecoderConfig
FST decoder options for CTC models.
OnlineModelConfig
Aggregate model configuration for streaming recognition.
OnlineNemoCtcModelConfig
Online NeMo CTC model configuration.
OnlineParaformerModelConfig
Online Paraformer model configuration.
OnlinePunctuation
Online punctuation restorer.
OnlinePunctuationConfig
Top-level configuration for OnlinePunctuation.
OnlinePunctuationModelConfig
Model-level options for online punctuation restoration.
OnlineRecognizer
Streaming speech recognizer.
OnlineRecognizerConfig
Top-level configuration for OnlineRecognizer.
OnlineSpeechDenoiser
Streaming speech denoiser.
OnlineSpeechDenoiserConfig
Top-level configuration for OnlineSpeechDenoiser.
OnlineStream
Input stream used by OnlineRecognizer.
OnlineToneCtcModelConfig
Online Tone CTC model configuration.
OnlineTransducerModelConfig
Online transducer model configuration.
OnlineZipformer2CtcModelConfig
Online Zipformer2 CTC model configuration.
RecognizerResult
Streaming ASR result returned by OnlineRecognizer::get_result.
SileroVadModelConfig
Silero VAD configuration.
SpeakerEmbeddingExtractor
Embedding extractor that consumes audio through an OnlineStream.
SpeakerEmbeddingExtractorConfig
Configuration for SpeakerEmbeddingExtractor.
SpeakerEmbeddingManager
In-memory index of named speaker embeddings.
SpeakerEmbeddingMatch
One speaker search result returned by SpeakerEmbeddingManager::get_best_matches.
SpeechSegment
One detected speech segment.
SpokenLanguageIdentification
Spoken language identifier.
SpokenLanguageIdentificationConfig
Top-level configuration for SpokenLanguageIdentification.
SpokenLanguageIdentificationResult
Result returned by SpokenLanguageIdentification::compute.
SpokenLanguageIdentificationWhisperConfig
Whisper model configuration for spoken language identification.
TenVadModelConfig
Ten VAD configuration.
VadModelConfig
Top-level model configuration for VoiceActivityDetector.
VoiceActivityDetector
Voice activity detector that emits speech segments.
Wave
A WAV file loaded through sherpa-onnx.

Functions§

file_exists
Return true if filename exists according to the native helper.
git_date
Return the Git date of the native library build.
git_sha1
Return the Git SHA1 of the native library build.
version
Return the sherpa-onnx version string compiled into the native library.
write
Write normalized PCM samples to a WAV file.