gigastt 2.1.0

Local STT server powered by GigaAM v3 e2e_rnnt — on-device Russian speech recognition via ONNX Runtime
Documentation

gigastt turns any machine into a private Russian speech-recognition server — or embeds the same engine into a Rust app or an Android binary. It runs the open GigaAM v3 model fully on-device via ONNX Runtime: no cloud, no API keys.

cargo install gigastt && gigastt serve
# WebSocket  ws://127.0.0.1:9876/v1/ws
# REST       http://127.0.0.1:9876/v1/transcribe
$ gigastt transcribe recording.wav
Привет, как дела?

Highlights

  • Real-time streaming — incremental partials over WebSocket; REST + SSE for files
  • Embeddable — a single static binary, a C-ABI FFI cdylib for Android/mobile, or the gigastt-core crate
  • Small & fast — INT8 model ~225 MB, real-time on CPU; CoreML / CUDA / NNAPI acceleration
  • Hardened server — loopback-only by default, origin allowlist, per-IP rate limiting, graceful drain, Prometheus metrics
  • MIT-clean — gigastt (MIT) on GigaAM v3 weights (MIT) — usable in commercial on-device products

Where it fits

gigastt is Russian-only and built for embedding, not for topping a WER leaderboard. On clean read speech Vosk is more accurate; for multilingual use whisper.cpp / sherpa-onnx / NVIDIA Parakeet. gigastt's niche is the smallest Russian model with no language-model trade-off, wrapped in an embeddable single-binary / FFI / streaming server with MIT-clean weights, and competitive on spontaneous and telephony speech. Full honest comparison vs Vosk 0.54, T-one and Whisper → Benchmarks.

Documentation

Guide Contents
API WebSocket protocol, REST + SSE, error codes, client examples (Python/Bun/Go/Kotlin)
Benchmarks WER / RTF / footprint vs 6 engines across 4 Russian domains, with caveats
Architecture Pipeline, model, hardware acceleration, INT8 quantization, project layout
Android / FFI Embedding via the C-ABI on Android
CLI · Deployment · Security · Troubleshooting Reference & ops

Install

# Homebrew (macOS arm64 / Linux x86_64)
brew tap ekhodzitsky/gigastt https://github.com/ekhodzitsky/gigastt && brew install gigastt

# crates.io — needs protoc on PATH (brew install protobuf / apt install protobuf-compiler)
cargo install gigastt

# Docker (CUDA: Dockerfile.cuda; bake the model with --build-arg GIGASTT_BAKE_MODEL=1)
docker build -t gigastt . && docker run -p 9876:9876 gigastt

The GigaAM v3 model (~850 MB) auto-downloads on first run and is INT8-quantized to ~225 MB.

Building also fetches a prebuilt onnxruntime over the network (ort's default download-binaries); the on-device / no-cloud guarantee covers runtime inference, not the build. See Architecture for air-gapped builds.

Requirements

Rust 1.88+, protoc on PATH. macOS 14+ (Apple Silicon, CoreML) or Linux x86_64 (optional NVIDIA CUDA 12+). ~1.5 GB disk, ~560 MB RAM. The gigastt-core crate has no server dependencies — embed it directly: gigastt-core = "2.0".

License

MIT — see LICENSE.

Benchmark data under benchmark/ is not MIT: OpenSTT (openstt_*, CC BY-NC 4.0) and Golos (golos_*, Sber Public License) transcripts keep their non-commercial licenses. See NOTICE and benchmark/DATA_LICENSE.

Acknowledgments