svod-model 0.1.0-alpha.3

Pretrained models inference abstraction.
Documentation

svod-model

High-level inference for pretrained deep learning models on top of svod-tensor. Each model is a pure-Rust port of an upstream checkpoint, fetched from HuggingFace Hub at runtime and executed through JIT-compiled plans.

Common infrastructure

Module Role
jit jit_wrapper!-generated wrappers, JitRecurrent<J>, InputSpec, JitError. Build-once / run-many execution. See JIT Graphs.
audio Log-mel spectrogram, Splitter trait for long-form chunking (default: SileroVadSplitter).
state HasStateDict + state_field! macros for loading PyTorch / safetensors checkpoints into Rust weight structs.
blocks Shared Conv2dWeights, BatchNormWeights, BasicBlock, Bottleneck, ResidualStage reused by every ResNet-shaped backbone. timm/torchvision key convention.
sentencepiece Minimal SentencePiece .model protobuf loader (vocab piece extraction).

Models

Name Domain Module Upstream HuggingFace
GigaAM v3 (CTC + RN-T) Speech gigaam salute-developers/GigaAM vpermilp/GigaAM-v3
Silero VAD 16k Speech silero_vad snakers4/silero-vad vpermilp/silero-vad
ResNet (18 / 34 / 50 / 101 / 152) Vision resnet He et al. 2015 timm/resnet*.a1_in1k
WeSpeaker ResNet34 Speaker embedding wespeaker wenet-e2e/wespeaker pyannote/wespeaker-voxceleb-resnet34-LM

Examples

cargo run -p svod-model --release --example gigaam_infer -- audio.wav
cargo run -p svod-model --release --example gigaam_rnnt_infer -- audio.wav
cargo run -p svod-model --release --example test_vad -- audio.wav
cargo run -p svod-model --release --example resnet_classify -- --hub --image dog.bin --side 224
cargo run -p svod-model --release --example wespeaker_parity -- --hub --data reference.npz