Crate silero

Source

Expand description

silero

Production-oriented Rust wrapper for the Silero VAD ONNX model.

LoC

§Introduction

Production-oriented Rust wrapper for the Silero VAD ONNX model.

This crate is designed around the way we actually run VAD in services:

one reusable ONNX session per worker
one small stream state per active audio stream
one optional segmenter that turns frame probabilities into speech ranges

It intentionally does not own queueing, health checks, worker counts, or ONNX thread policy. Those belong in a higher-level service crate.

§Model layout

Silero VAD is a stateful model:

input audio: fixed-size 8 kHz or 16 kHz chunks
rolling context: 32 samples at 8 kHz, 64 samples at 16 kHz
recurrent memory: state / stateN

Because of that, the crate exposes three core building blocks:

Session
- owns the ONNX Runtime session
- supports exact-chunk single inference and multi-stream batch inference
StreamState
- owns per-stream model memory: recurrent state, rolling context, and tail buffer
SpeechSegmenter
- turns frame probabilities into SpeechSegments using hysteresis and timing rules

§Quick start

use silero::{Session, SpeechOptions, detect_speech};

fn main() -> Result<(), silero::Error> {
    let model = include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"), "/models/silero_vad.onnx"));
    let audio_16k: Vec<f32> = vec![0.0; 16_000];
    let mut session = Session::from_memory(model)?;
    let segments = detect_speech(&mut session, &audio_16k, SpeechOptions::default())?;

    println!("detected {} speech segments", segments.len());
    Ok(())
}

§Streaming usage

use silero::{Session, SpeechOptions, SpeechSegmenter, StreamState};

fn main() -> Result<(), silero::Error> {
    let model = include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"), "/models/silero_vad.onnx"));
    let mut session = Session::from_memory(model)?;
    let config = SpeechOptions::default();
    let mut stream = StreamState::new(config.sample_rate());
    let mut segmenter = SpeechSegmenter::new(config.clone());
    let audio_chunk = vec![0.0_f32; config.sample_rate().chunk_samples()];

    let print = |segment: silero::SpeechSegment| {
        println!(
            "speech {:.2}s -> {:.2}s",
            segment.start_seconds(),
            segment.end_seconds()
        );
    };

    if let Some(segment) = segmenter.push_samples(&mut session, &mut stream, &audio_chunk)? {
        print(segment);
        while let Some(more) = segmenter.push_samples(&mut session, &mut stream, &[])? {
            print(more);
        }
    }
    if let Some(segment) = segmenter.finish_stream(&mut session, &mut stream)? {
        print(segment);
        while let Some(more) = segmenter.push_samples(&mut session, &mut stream, &[])? {
            print(more);
        }
    }

    Ok(())
}

§Batch inference

Silero’s batch dimension represents independent streams at the same sample rate, not consecutive chunks from one stream.

use silero::{BatchInput, SampleRate, Session, StreamState};

let model = include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"), "/models/silero_vad.onnx"));
let mut session = Session::from_memory(model).unwrap();
let mut a = StreamState::new(SampleRate::Rate16k);
let mut b = StreamState::new(SampleRate::Rate16k);
let chunk_a = vec![0.0_f32; 512];
let chunk_b = vec![0.0_f32; 512];

let mut batch = [
    BatchInput::new(&mut a, &chunk_a),
    BatchInput::new(&mut b, &chunk_b),
];
let probabilities = session.infer_batch(&mut batch).unwrap();

assert_eq!(probabilities.len(), 2);

§Session construction

The crate bundles models/silero_vad.onnx and exposes:

Session::bundled() when the bundled feature is enabled
Session::from_file(...)
Session::from_memory(...)
Session::from_ort_session(...)

SessionOptions only contains model-local options such as graph optimization. If a service needs to tune intra_threads / inter_threads, build the ORT session at the service layer and pass it into Session::from_ort_session(...).

§Notes

Direct sample rates: 8 kHz and 16 kHz.
The crate does not do audio decoding or resampling.
SpeechDetector is kept as a type alias for SpeechSegmenter.

§Development

cargo fmt
cargo test

§License

silero is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details.

§Third-party model notice

This crate bundles and redistributes models/silero_vad.onnx, a Silero VAD model from the upstream Silero project.

The bundled model is third-party content and remains subject to its upstream license terms. The upstream Silero model is distributed under the MIT license, and redistribution should retain the upstream copyright and permission notice.

See THIRD_PARTY_NOTICES.md for the bundled model’s upstream sources, attribution, and MIT notice text.

Structs§

BatchInput: One exact-size chunk paired with the per-stream memory it belongs to.
Session: ONNX Runtime session for Silero VAD inference.
SessionOptions: Options for constructing an ONNX session.
SpeechOptions: Configuration for turning frame probabilities into speech segments.
SpeechSegment: One speech segment on the stream timeline.
SpeechSegmenter: Streaming post-processor that turns frame probabilities into speech segments.
StreamState: Per-stream model memory for Silero VAD.

Enums§

Error: Errors that can occur during Silero VAD operations.
GraphOptimizationLevel: ONNX Runtime provides various graph optimizations to improve performance. Graph optimizations are essentially graph-level transformations, ranging from small graph simplifications and node eliminations to more complex node fusions and layout optimizations.
SampleRate: Sample rates directly supported by the Silero VAD model.

Constants§

BUNDLED_MODELbundled: Bundled ONNX model for Silero VAD inference, included as bytes in the binary when the bundled feature is enabled.
VERSION: Version string of the silero crate (CARGO_PKG_VERSION).

Functions§

detect_speech: Convenience helper for one-shot offline detection on a full buffer.

Type Aliases§

Result: A convenient alias for results returned by Silero VAD operations, using the custom Error type defined above.
SpeechDetector: Backwards-compatible alias for callers that think in “detector” rather than “segmenter” terms.

Crate silero

Crate silero Copy item path

silero

§Introduction

§Model layout

§Quick start

§Streaming usage

§Batch inference

§Session construction

§Notes

§Development

§License

§Third-party model notice

Structs§

Enums§

Constants§

Functions§

Type Aliases§

Crate silero