whisp-rs
Lightweight voice-to-text dictation for Linux (Wayland/COSMIC). Hold a hotkey, speak, release — text appears at your cursor.
Single binary. No local models. No bloat.
How it works
- Hold
Ctrl+Space - Speak into your microphone
- Release — your words are typed at the cursor
Audio is captured via ALSA, sent to Deepgram for transcription, and injected into the focused window using wtype or clipboard paste.
Features
- Global hotkey via
evdev— works in any app, any workspace - System tray icon shows state: idle / recording / processing
- Multiple injection methods:
wtype→ clipboard+paste →ydotool(auto-fallback) - First-run setup wizard — no manual config editing needed
- Raw PCM streaming — no WAV overhead, minimal latency
- Configurable: model, language, hotkey, sample rate
- Tiny binary (~2MB release build with LTO)
Requirements
| Dependency | Purpose | Install |
|---|---|---|
arecord |
Audio capture (ALSA) | sudo apt install alsa-utils |
wtype or wl-clipboard |
Text injection | sudo apt install wtype or sudo apt install wl-clipboard |
ydotool |
Fallback injection | sudo apt install ydotool |
input group |
Hotkey access | sudo usermod -aG input $USER && reboot |
OS: Linux with Wayland (tested on Pop!_OS COSMIC, should work on GNOME, KDE, Sway, Hyprland)
Install
One-shot installer (Debian/Ubuntu/Pop!_OS)
No Rust required. Downloads the pre-built binary and installs all system deps:
|
Pre-built binary
Download from GitHub Releases, extract, and run:
From crates.io (requires Rust)
From source
# Binary at: target/release/whisp-rs
Setup
First run (automatic wizard)
If no config exists, the setup wizard launches automatically. It asks for:
- Hotkey (default:
Ctrl+Space) - Deepgram API key (get one free — $200 credit)
- STT model (default:
nova-2-general) - Language
Manual setup
# Set API key only
# Re-run full wizard
Environment variable
Configuration
Config file: ~/.config/whisp-rs/config.toml
[]
= ["ctrl"] # ctrl | super | alt | shift (or combo: ["ctrl", "shift"])
= "space"
[]
= "your-deepgram-api-key"
= "nova-2-general" # nova-2-general | nova-2 | base
= "en" # en, fr, es, de, ar, zh, ja, ko, auto
[]
= 16000
= 1
Models
| Model | Speed | Accuracy | Best for |
|---|---|---|---|
nova-2-general |
Fastest | Good | Dictation (default) |
nova-2 |
Fast | Best | Accuracy-critical |
base |
Fastest | Lower | Quick notes |
CLI flags
whisp-rs # Start dictation daemon
whisp-rs --set-api-key KEY # Save API key to config
whisp-rs --setup # Re-run setup wizard
whisp-rs --help # Show help
Logging
# Default: info level
# Debug: see API calls, hotkey events, injection attempts
RUST_LOG=whisp_rs=debug
# Trace: everything
RUST_LOG=whisp_rs=trace
Troubleshooting
"No keyboard input devices found"
# Log out and back in (or reboot)
"All injection methods failed"
# Install wtype (recommended for Wayland)
# Or install wl-clipboard + ydotool
&
"Missing dependencies: arecord"
Empty transcriptions
- Check your microphone is connected and not muted
- Run with
RUST_LOG=whisp_rs=debugto see audio capture sizes - Test mic:
arecord -f S16_LE -r 16000 -c 1 -t raw -d 5 /tmp/test.raw && aplay -f S16_LE -r 16000 -c 1 /tmp/test.raw
Architecture
┌─────────────┐ ┌──────────┐ ┌─────────────┐
│ evdev │────▶│ arecord │────▶│ Deepgram │
│ hotkey │ │ (ALSA) │ │ API │
└─────────────┘ └──────────┘ └──────┬──────┘
│
┌─────────────┐ ┌──────────┐ │
│ ksni │ │ wtype / │◀───────────┘
│ tray icon │ │ inject │
└─────────────┘ └──────────┘
- Hotkey:
evdevreads raw keyboard events from/dev/input/ - Audio:
arecordsubprocess captures raw PCM (16kHz, 16-bit, mono) - STT: Deepgram REST API with raw PCM streaming (no WAV overhead)
- Injection:
wtype→wl-copy+ydotool→ydotool type(auto-fallback) - Tray:
ksnisystem tray with state indicators
Support
If this project saves you time, consider buying me a coffee:
License
MIT