gigastt turns any machine into a private Russian speech-recognition server — or embeds the same engine into a Rust app or an Android binary. It runs the open GigaAM v3 model fully on-device via ONNX Runtime: no cloud, no API keys.
&&
# WebSocket ws://127.0.0.1:9876/v1/ws
# REST http://127.0.0.1:9876/v1/transcribe
Highlights
- Real-time streaming — incremental partials over WebSocket; REST + SSE for files
- Embeddable — a single static binary, a C-ABI FFI
cdylibfor Android/mobile, or thegigastt-corecrate - Accurate & small — most accurate on 3 of 4 Russian domains, ties Vosk 0.54 on clean read (3.55%); INT8 model ~225 MB, real-time on CPU (RTF ~0.10); CoreML / CUDA / NNAPI acceleration
- Hardened server — loopback-only by default, origin allowlist, per-IP rate limiting, graceful drain, Prometheus metrics
- MIT-clean — gigastt (MIT) on GigaAM v3 weights (MIT) — usable in commercial on-device products
Where it fits
gigastt is Russian-only and built for embedding. Its rnnt head (the v2.3 default) is the most accurate engine on 3 of 4 Russian domains (far-field, phone, YouTube) and statistically ties Vosk 0.54 on clean read (3.55% vs 2.97%, CIs overlap). For multilingual use see whisper.cpp / sherpa-onnx / NVIDIA Parakeet. gigastt's niche is the smallest Russian model with no language-model trade-off, wrapped in an embeddable single-binary / FFI / streaming server with MIT-clean weights, and competitive on spontaneous and telephony speech. Full honest comparison vs Vosk 0.54, T-one and Whisper → Benchmarks.
Documentation
| Guide | Contents |
|---|---|
| API | WebSocket protocol, REST + SSE, error codes, client examples (Python/Bun/Go/Kotlin) |
| Benchmarks | WER / RTF / footprint vs 6 engines across 4 Russian domains, with caveats |
| Architecture | Pipeline, model, hardware acceleration, INT8 quantization, project layout |
| Android / FFI | Embedding via the C-ABI on Android |
| CLI · Deployment · Security · Troubleshooting | Reference & ops |
Install
# Homebrew (macOS arm64 / Linux x86_64)
&&
# crates.io — needs protoc on PATH (brew install protobuf / apt install protobuf-compiler)
# Docker (CUDA: Dockerfile.cuda; bake the model with --build-arg GIGASTT_BAKE_MODEL=1)
&&
The GigaAM v3 model (~850 MB) auto-downloads on first run and is INT8-quantized to ~225 MB.
Building also fetches a prebuilt onnxruntime over the network (ort's default
download-binaries); the on-device / no-cloud guarantee covers runtime inference, not the build. See Architecture for air-gapped builds.
Requirements
Rust 1.88+, protoc on PATH. macOS 14+ (Apple Silicon, CoreML) or Linux x86_64 (optional NVIDIA CUDA 12+). ~1.5 GB disk, ~790 MB RAM at the default --pool-size 2 (~400 MB single-session). The gigastt-core crate has no server dependencies — embed it directly: gigastt-core = "2.0".
License
MIT — see LICENSE.
Benchmark data under
benchmark/is not MIT: OpenSTT (openstt_*, CC BY-NC 4.0) and Golos (golos_*, Sber Public License) transcripts keep their non-commercial licenses. SeeNOTICEandbenchmark/DATA_LICENSE.
Acknowledgments
- GigaAM by SberDevices — the speech recognition model
- onnx-asr by @istupakov — ONNX export & reference
- ONNX Runtime · ort — inference engine & Rust bindings