gigastt 0.1.1 - Docs.rs

gigastt-0.1.1 is not a library.

Visit the last successful build: gigastt-2.3.0

Features

Real-time streaming — partial transcription results appear as you speak via WebSocket
On-device — no cloud APIs, no API keys, zero cost, full privacy
Punctuation — automatic punctuation and text normalization (e2e model)
Russian-first — GigaAM v3 e2e_rnnt, 50% better than Whisper-large-v3 on Russian benchmarks
Fast — Conformer encoder (240M params) optimized for Apple Silicon via CoreML
Auto-download — model fetched from HuggingFace on first run

Quick Start

cargo install gigastt
gigastt serve
# Listening on ws://127.0.0.1:9876
# Model auto-downloaded (~850MB) on first run

Usage

Start STT server

gigastt serve
# Options: --port 9876 --model-dir ~/.gigastt/models

Transcribe a file

gigastt transcribe recording.wav
# Outputs transcribed Russian text to stdout
# Requires: mono PCM16 WAV at 16kHz

Download model only

gigastt download
# Downloads to ~/.gigastt/models/

WebSocket Protocol

Connect to ws://127.0.0.1:9876 and send PCM16 mono 16kHz binary frames.

Messages

Direction	Type	Fields	Description
Server	`ready`	`model`, `sample_rate`	Server is ready to accept audio
Server	`partial`	`text`, `timestamp`	Interim transcription (may change)
Server	`final`	`text`, `timestamp`	Complete utterance with punctuation
Server	`error`	`message`, `code`	Error occurred

Example

{"type": "ready", "model": "gigaam-v3-e2e-rnnt", "sample_rate": 16000}
{"type": "partial", "text": "что такое"}
{"type": "final", "text": "Что такое Node.js?"}

Client Examples

See examples/ for ready-to-use WebSocket clients:

Python: python examples/python_client.py recording.wav
JavaScript: node examples/js_client.mjs recording.wav

Model

GigaAM v3 e2e_rnnt — Conformer-based ASR model by SberDevices:

Property	Value
Parameters	240M (Conformer encoder)
Architecture	RNN-T (encoder + decoder + joiner)
Training data	700K+ hours of Russian speech
Vocabulary	1025 BPE tokens
Input	16kHz mono PCM16
License	MIT

Requirements

macOS 14+ on Apple Silicon (M1/M2/M3/M4)
~1.5GB disk space (model + binary)
~500MB RAM during inference
Rust 1.75+ (for building from source)

Architecture

Audio (PCM16 16kHz)
  │
  ▼
┌──────────────────┐
│  Mel Spectrogram  │  64 bins, FFT=320, hop=160
└────────┬─────────┘
         ▼
┌──────────────────┐
│  Conformer       │  16 layers, d=768
│  Encoder (ONNX)  │
└────────┬─────────┘
         ▼
┌──────────────────┐
│  RNN-T Decoder   │  LSTM h/c state persisted
│  + Joiner (ONNX) │  across audio chunks
└────────┬─────────┘
         ▼
  Text (with punctuation)

License

MIT

Acknowledgments

GigaAM by SberDevices — the speech recognition model
onnx-asr by @istupakov — ONNX model export and reference implementation
ort — Rust bindings for ONNX Runtime