Expand description
Safe Rust bindings for the public sherpa-onnx inference APIs.
This crate wraps the sherpa-onnx C API with RAII-owned Rust types and idiomatic configuration structs. The main feature families are:
- offline ASR through
OfflineRecognizer - streaming ASR through
OnlineRecognizer - offline text-to-speech through
OfflineTts - voice activity detection through
VoiceActivityDetector - speaker embeddings and diarization
- online punctuation
- offline and streaming speech denoising
- audio tagging
- WAV I/O helpers through
Waveandwrite()
§Setup
This crate now links statically by default. If SHERPA_ONNX_LIB_DIR is not
set, the build script downloads a matching prebuilt -lib archive from
GitHub releases and uses
it automatically during the build.
In other words, the default setup for most users is simply:
sherpa-onnx = "1.13.1"If you want shared libraries instead, disable the default feature and enable
shared:
sherpa-onnx = { version = "1.13.1", default-features = false, features = ["shared"] }For advanced use cases, set SHERPA_ONNX_LIB_DIR to a directory that already
contains sherpa-onnx libraries:
export SHERPA_ONNX_LIB_DIR=/path/to/sherpa-onnx/libThat override works for both static and shared builds.
Shared mode is also intended to work out of the box for normal users:
- Linux and macOS: the build script adds both absolute and relative rpath entries automatically, and copies the required shared runtime libraries next to Cargo-generated binaries and examples.
- Windows: the build script copies the required DLLs next to the generated binaries automatically when using shared libraries.
So most users do not need to manually set LD_LIBRARY_PATH or
DYLD_LIBRARY_PATH.
Example v1.13.1 archives used by the build script:
Default static archives:
- Linux x86_64: sherpa-onnx-v1.13.1-linux-x64-static-lib.tar.bz2
- Linux aarch64: sherpa-onnx-v1.13.1-linux-aarch64-static-lib.tar.bz2
- macOS x86_64: sherpa-onnx-v1.13.1-osx-x64-static-lib.tar.bz2
- macOS arm64: sherpa-onnx-v1.13.1-osx-arm64-static-lib.tar.bz2
- Windows x64: sherpa-onnx-v1.13.1-win-x64-static-MT-Release-lib.tar.bz2
Optional shared archives:
- Linux x86_64: sherpa-onnx-v1.13.1-linux-x64-shared-lib.tar.bz2
- Linux aarch64: sherpa-onnx-v1.13.1-linux-aarch64-shared-cpu-lib.tar.bz2
- macOS x86_64: sherpa-onnx-v1.13.1-osx-x64-shared-lib.tar.bz2
- macOS arm64: sherpa-onnx-v1.13.1-osx-arm64-shared-lib.tar.bz2
- Windows x64: sherpa-onnx-v1.13.1-win-x64-shared-MT-Release-lib.tar.bz2
§How the Rust API is organized
Most APIs follow the same pattern:
- Start with a
*Configvalue and fill the fields for exactly one model family. - Call
create()to construct the runtime object. - Create a stream if the API is stream-based.
- Feed audio or text, then fetch results with the provided accessor methods.
All runtime wrappers automatically free their underlying C resources on drop.
§Examples
The repository contains end-to-end Rust examples under
rust-api-examples/examples/.
Good entry points are:
sense_voice.rsnemo_parakeet.rsstreaming_zipformer.rspocket_tts.rssilero_vad_remove_silence.rsonline_punctuation.rsoffline_punctuation.rskeyword_spotter.rsspoken_language_identification.rsoffline_speaker_diarization.rsspeaker_embedding_manager.rs
§Offline recognition example
use sherpa_onnx::{
OfflineRecognizer, OfflineRecognizerConfig, OfflineSenseVoiceModelConfig, Wave,
};
let wave = Wave::read("./test.wav").expect("read wave");
let mut config = OfflineRecognizerConfig::default();
config.model_config.sense_voice = OfflineSenseVoiceModelConfig {
model: Some(
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.int8.onnx".into(),
),
language: Some("auto".into()),
use_itn: true,
};
config.model_config.tokens = Some(
"./sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt".into(),
);
let recognizer = OfflineRecognizer::create(&config).expect("create recognizer");
let stream = recognizer.create_stream();
stream.accept_waveform(wave.sample_rate(), wave.samples());
recognizer.decode(&stream);
let result = stream.get_result().expect("result");
println!("{}", result.text);§Streaming recognition example
use sherpa_onnx::{OnlineRecognizer, OnlineRecognizerConfig, Wave};
let wave = Wave::read("./test.wav").expect("read wave");
let mut config = OnlineRecognizerConfig::default();
config.model_config.transducer.encoder = Some(
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.int8.onnx".into(),
);
config.model_config.transducer.decoder = Some(
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx".into(),
);
config.model_config.transducer.joiner = Some(
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.int8.onnx".into(),
);
config.model_config.tokens = Some(
"./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt".into(),
);
config.enable_endpoint = true;
config.decoding_method = Some("greedy_search".into());
let recognizer = OnlineRecognizer::create(&config).expect("create recognizer");
let stream = recognizer.create_stream();
stream.accept_waveform(wave.sample_rate(), wave.samples());
stream.input_finished();
while recognizer.is_ready(&stream) {
recognizer.decode(&stream);
}§TTS example
use sherpa_onnx::{OfflineTts, OfflineTtsConfig, OfflineTtsModelConfig, OfflineTtsPocketModelConfig};
let config = OfflineTtsConfig {
model: OfflineTtsModelConfig {
pocket: OfflineTtsPocketModelConfig {
lm_flow: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_flow.int8.onnx".into()),
lm_main: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/lm_main.int8.onnx".into()),
encoder: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/encoder.onnx".into()),
decoder: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/decoder.int8.onnx".into()),
text_conditioner: Some(
"./sherpa-onnx-pocket-tts-int8-2026-01-26/text_conditioner.onnx".into(),
),
vocab_json: Some("./sherpa-onnx-pocket-tts-int8-2026-01-26/vocab.json".into()),
token_scores_json: Some(
"./sherpa-onnx-pocket-tts-int8-2026-01-26/token_scores.json".into(),
),
..Default::default()
},
..Default::default()
},
..Default::default()
};
let tts = OfflineTts::create(&config).expect("create tts");
println!("{}", tts.sample_rate());Structs§
- Audio
Event - One predicted audio event.
- Audio
Tagging - Offline audio tagger.
- Audio
Tagging Config - Top-level configuration for
AudioTagging. - Audio
Tagging Model Config - Model-level configuration for audio tagging.
- Audio
Tagging Offline Stream - Input stream for offline audio tagging.
- Circular
Buffer - Circular sample buffer used by some VAD workflows.
- Denoised
Audio - Denoised samples returned from an offline or online denoiser.
- Display
Manager - Stores finalized sentences with timestamps and the current partial hypothesis for terminal UIs.
- Fast
Clustering Config - Fast clustering options used after segmentation and embedding extraction.
- Generated
Audio - Generated audio returned by
OfflineTts::generate_with_config. - Generation
Config - Per-request generation options for
OfflineTts::generate_with_config. - Homophone
Replacer Config - Optional homophone replacement resources.
- Keyword
Result - Decoded keyword spotting result for one stream.
- Keyword
Spotter - Streaming keyword spotter.
- Keyword
Spotter Config - Configuration for
KeywordSpotter. - Linear
Resampler - A linear resampler that converts audio from one sample rate to another.
- Offline
Canary Model Config - Offline Canary model configuration.
- Offline
Cohere Transcribe Model Config - Offline Cohere Transcribe model configuration.
- Offline
Dolphin Model Config - Offline Dolphin model configuration.
- Offline
Fire RedAsr CtcModel Config - Offline FireRed ASR CTC model configuration.
- Offline
Fire RedAsr Model Config - Offline FireRed ASR transducer configuration.
- Offline
FunASR Nano Model Config - Offline FunASR Nano model configuration.
- OfflineLM
Config - Optional external language model configuration for offline ASR.
- Offline
MedAsr CtcModel Config - Offline MedASR CTC model configuration.
- Offline
Model Config - Aggregate model configuration for offline recognition.
- Offline
Moonshine Model Config - For Moonshine v1, you need 4 models:
- Offline
Nemo EncDec CtcModel Config - Offline NeMo CTC model configuration.
- Offline
Omnilingual AsrCtc Model Config - Offline omnilingual CTC model configuration.
- Offline
Paraformer Model Config - Offline Paraformer model configuration.
- Offline
Punctuation - Offline punctuation restorer.
- Offline
Punctuation Config - Top-level configuration for
OfflinePunctuation. - Offline
Punctuation Model Config - Model configuration for offline punctuation restoration.
- Offline
Qwen3ASR Model Config - Offline Qwen3 ASR model configuration.
- Offline
Recognizer - Offline speech recognizer.
- Offline
Recognizer Config - Top-level configuration for
OfflineRecognizer. - Offline
Recognizer Result - Recognition result returned by
OfflineStream::get_result. - Offline
Sense Voice Model Config - Offline SenseVoice model configuration.
- Offline
Speaker Diarization - Offline speaker diarizer.
- Offline
Speaker Diarization Config - Top-level configuration for
OfflineSpeakerDiarization. - Offline
Speaker Diarization Result - Result object returned by
OfflineSpeakerDiarization::process. - Offline
Speaker Diarization Segment - One diarization segment labeled with a speaker index.
- Offline
Speaker Segmentation Model Config - Segmentation model configuration for diarization.
- Offline
Speaker Segmentation Pyannote Model Config - Pyannote segmentation model path.
- Offline
Speech Denoiser - Offline speech denoiser.
- Offline
Speech Denoiser Config - Top-level configuration for
OfflineSpeechDenoiser. - Offline
Speech Denoiser Dpdf NetModel Config - DPDFNet model path for offline denoising.
- Offline
Speech Denoiser Gtcrn Model Config - GTCRN model path for offline denoising.
- Offline
Speech Denoiser Model Config - Aggregate model configuration for
OfflineSpeechDenoiser. - Offline
Stream - Input stream used by
OfflineRecognizer. - Offline
Tdnn Model Config - Offline TDNN model configuration.
- Offline
Transducer Model Config - Offline transducer model configuration.
- Offline
Tts - Offline TTS engine.
- Offline
TtsConfig - Top-level configuration for
OfflineTts. - Offline
TtsKitten Model Config - Kitten model configuration.
- Offline
TtsKokoro Model Config - Kokoro model configuration.
- Offline
TtsMatcha Model Config - Matcha model configuration.
- Offline
TtsModel Config - Aggregate model configuration for
OfflineTts. - Offline
TtsPocket Model Config - Pocket TTS model configuration.
- Offline
TtsSupertonic Model Config - Supertonic model configuration.
- Offline
TtsVits Model Config - VITS model configuration.
- Offline
TtsZipvoice Model Config - ZipVoice model configuration.
- Offline
Wenet CtcModel Config - Offline WeNet CTC model configuration.
- Offline
Whisper Model Config - Offline Whisper model configuration.
- Offline
Zipformer Audio Tagging Model Config - Zipformer audio tagging model path.
- Offline
Zipformer CtcModel Config - Offline Zipformer CTC model configuration.
- Online
CtcFst Decoder Config - FST decoder options for CTC models.
- Online
Model Config - Aggregate model configuration for streaming recognition.
- Online
Nemo CtcModel Config - Online NeMo CTC model configuration.
- Online
Paraformer Model Config - Online Paraformer model configuration.
- Online
Punctuation - Online punctuation restorer.
- Online
Punctuation Config - Top-level configuration for
OnlinePunctuation. - Online
Punctuation Model Config - Model-level options for online punctuation restoration.
- Online
Recognizer - Streaming speech recognizer.
- Online
Recognizer Config - Top-level configuration for
OnlineRecognizer. - Online
Speech Denoiser - Streaming speech denoiser.
- Online
Speech Denoiser Config - Top-level configuration for
OnlineSpeechDenoiser. - Online
Stream - Input stream used by
OnlineRecognizer. - Online
Tone CtcModel Config - Online Tone CTC model configuration.
- Online
Transducer Model Config - Online transducer model configuration.
- Online
Zipformer2 CtcModel Config - Online Zipformer2 CTC model configuration.
- Recognizer
Result - Streaming ASR result returned by
OnlineRecognizer::get_result. - Silero
VadModel Config - Silero VAD configuration.
- Speaker
Embedding Extractor - Embedding extractor that consumes audio through an
OnlineStream. - Speaker
Embedding Extractor Config - Configuration for
SpeakerEmbeddingExtractor. - Speaker
Embedding Manager - In-memory index of named speaker embeddings.
- Speaker
Embedding Match - One speaker search result returned by
SpeakerEmbeddingManager::get_best_matches. - Speech
Segment - One detected speech segment.
- Spoken
Language Identification - Spoken language identifier.
- Spoken
Language Identification Config - Top-level configuration for
SpokenLanguageIdentification. - Spoken
Language Identification Result - Result returned by
SpokenLanguageIdentification::compute. - Spoken
Language Identification Whisper Config - Whisper model configuration for spoken language identification.
- TenVad
Model Config - Ten VAD configuration.
- VadModel
Config - Top-level model configuration for
VoiceActivityDetector. - Voice
Activity Detector - Voice activity detector that emits speech segments.
- Wave
- A WAV file loaded through sherpa-onnx.
Functions§
- file_
exists - Return
trueiffilenameexists according to the native helper. - git_
date - Return the Git date of the native library build.
- git_
sha1 - Return the Git SHA1 of the native library build.
- version
- Return the sherpa-onnx version string compiled into the native library.
- write
- Write normalized PCM samples to a WAV file.