gigastt 2.3.0

Local STT server powered by GigaAM v3 e2e_rnnt — on-device Russian speech recognition via ONNX Runtime
Documentation

gigastt turns any machine into a private Russian speech-recognition server — or embeds the same engine into a Rust app or an Android binary. It runs the open GigaAM v3 model fully on-device via ONNX Runtime: no cloud, no API keys.

cargo install gigastt && gigastt serve
# WebSocket  ws://127.0.0.1:9876/v1/ws
# REST       http://127.0.0.1:9876/v1/transcribe
$ gigastt transcribe recording.wav
Привет, как дела?

Highlights

  • Real-time streaming — incremental partials over WebSocket; REST + SSE for files
  • Embeddable — a single static binary, a C-ABI FFI cdylib for Android/mobile, or the gigastt-core crate
  • Accurate & small — most accurate on 3 of 4 Russian domains, ties Vosk 0.54 on clean read (3.55%); INT8 model ~225 MB, real-time on CPU (RTF ~0.10); CoreML / CUDA / NNAPI acceleration
  • Hardened server — loopback-only by default, origin allowlist, per-IP rate limiting, graceful drain, Prometheus metrics
  • MIT-clean — gigastt (MIT) on GigaAM v3 weights (MIT) — usable in commercial on-device products

Where it fits

gigastt is Russian-only and built for embedding. Its rnnt head (the v2.3 default) is the most accurate engine on 3 of 4 Russian domains (far-field, phone, YouTube) and statistically ties Vosk 0.54 on clean read (3.55% vs 2.97%, CIs overlap). For multilingual use see whisper.cpp / sherpa-onnx / NVIDIA Parakeet. gigastt's niche is the smallest Russian model with no language-model trade-off, wrapped in an embeddable single-binary / FFI / streaming server with MIT-clean weights, and competitive on spontaneous and telephony speech. Full honest comparison vs Vosk 0.54, T-one and Whisper → Benchmarks.

Documentation

Guide Contents
API WebSocket protocol, REST + SSE, error codes, client examples (Python/Bun/Go/Kotlin)
Benchmarks WER / RTF / footprint vs 6 engines across 4 Russian domains, with caveats
Architecture Pipeline, model, hardware acceleration, INT8 quantization, project layout
Android / FFI Embedding via the C-ABI on Android
CLI · Deployment · Security · Troubleshooting Reference & ops

Install

# Homebrew (macOS arm64 / Linux x86_64)
brew tap ekhodzitsky/gigastt https://github.com/ekhodzitsky/gigastt && brew install gigastt

# crates.io — needs protoc on PATH (brew install protobuf / apt install protobuf-compiler)
cargo install gigastt

# Docker (CUDA: Dockerfile.cuda; bake the model with --build-arg GIGASTT_BAKE_MODEL=1)
docker build -t gigastt . && docker run -p 9876:9876 gigastt

The GigaAM v3 model (~850 MB) auto-downloads on first run and is INT8-quantized to ~225 MB.

Building also fetches a prebuilt onnxruntime over the network (ort's default download-binaries); the on-device / no-cloud guarantee covers runtime inference, not the build. See Architecture for air-gapped builds.

Requirements

Rust 1.88+, protoc on PATH. macOS 14+ (Apple Silicon, CoreML) or Linux x86_64 (optional NVIDIA CUDA 12+). ~1.5 GB disk, ~790 MB RAM at the default --pool-size 2 (~400 MB single-session). The gigastt-core crate has no server dependencies — embed it directly: gigastt-core = "2.0".

License

MIT — see LICENSE.

Benchmark data under benchmark/ is not MIT: OpenSTT (openstt_*, CC BY-NC 4.0) and Golos (golos_*, Sber Public License) transcripts keep their non-commercial licenses. See NOTICE and benchmark/DATA_LICENSE.

Acknowledgments