gigastt-core
Core inference engine for gigastt — Russian speech recognition powered by GigaAM v3 via ONNX Runtime. No server dependencies, no tokio runtime requirement for inference — embed directly into any Rust application.
Usage
[]
= "2.0"
use Engine;
use model;
// Download model on first run (~850 MB)
let model_dir = default_model_dir;
ensure_model.await?;
// Load engine (pool_size controls concurrent sessions)
let engine = load?;
// Transcribe a file
let mut guard = engine.pool.checkout.await?;
let text = engine.transcribe_file?;
println!;
// guard is returned to the pool on drop
Streaming recognition
use Engine;
let engine = load?;
let mut guard = engine.pool.checkout.await?;
let mut state = engine.create_state?;
// Feed PCM16 chunks (16 kHz mono)
let segments = engine.process_chunk?;
for seg in &segments
// Flush remaining audio
let final_segments = engine.flush_state?;
Features
| Feature | Description |
|---|---|
diarization |
Speaker identification via polyvoice (default: enabled) |
coreml |
CoreML + Neural Engine on macOS ARM64 |
cuda |
CUDA 12+ on Linux x86_64 |
nnapi |
Android NNAPI for NPU/DSP acceleration |
Features are compile-time and mutually exclusive (coreml / cuda).
What's included
- Inference engine — ONNX Runtime session pool, Conformer encoder, RNN-T decoder + joiner
- Mel spectrogram — 64 bins, FFT=320, hop=160, HTK scale
- BPE tokenizer — 1025 tokens with automatic punctuation
- Audio loading — WAV, M4A, MP3, OGG, FLAC via symphonia; resampling via rubato
- Model download — streaming from HuggingFace with SHA-256 verification + atomic rename
- INT8 quantization — native Rust quantizer, auto-detected at runtime
- Protocol types —
ClientMessage,ServerMessage,TranscriptSegmentfor WebSocket/REST
Requirements
- Rust 1.85+ (edition 2024)
protocon PATH (brew install protobuf/apt install protobuf-compiler)
License
MIT