_ _
__ __| |__ (_)___ _ __ ___
\ \ /\ / /| '_ \| / __|| '__/ __|
\ V V / | | | | \__ \| | \__ \
\_/\_/ |_| |_|_|___/|_| |___/
speak. type. done.
whisrs
Linux-first voice-to-text dictation, written in Rust.
Press a hotkey, speak, and your words appear at the cursor — in any app, any window manager, any desktop environment. Fast, private, open source.
Why whisrs?
This project is directly inspired by xhisper — a solid tool that proved Linux dictation works. whisrs takes that concept and rebuilds it in Rust with a proper architecture, native layout support, window tracking, and multiple transcription backends.
Landscape
| whisrs | xhisper | Wispr Flow | nerd-dictation | Superwhisper | |
|---|---|---|---|---|---|
| Platform | Linux | Linux | macOS, Windows, iOS, Android | Linux | macOS, Windows, iOS |
| Language | Rust | C + Shell | Proprietary | Python | Proprietary |
| Transcription | Groq, OpenAI, local whisper.cpp | Groq only | Cloud (proprietary) | Vosk (local) | Local Whisper + cloud |
| Streaming | Yes (OpenAI Realtime) | No | Yes | No | Yes |
| Offline | Yes (whisper.cpp) | No | No | Yes | Yes |
| Open source | Yes (MIT) | Yes | No | Yes (GPL) | No |
| Price | Free | Free | Free tier / $12/mo Pro | Free | $8.49/mo or $250 lifetime |
Also worth knowing about:
- Speech Note — Linux desktop app (Flatpak) with offline STT, TTS, and translation. Supports Vosk, whisper.cpp, Faster Whisper. GUI-focused, not a CLI tool.
- VoiceInk — macOS only, local Whisper, open source (GPL). No Linux.
What whisrs adds over xhisper
| whisrs | xhisper | |
|---|---|---|
| Keyboard layout | Automatic XKB reverse lookup — works natively on any layout | Hardcoded QWERTY keycodes — non-QWERTY requires an input-switch workaround (e.g. --rightalt to toggle OS layout to QWERTY) |
| Window tracking | Captures focused window on record start, restores focus before typing | None — text goes to whatever window is focused |
| Typing | Bulk text processing in one pass through uinput | Character-by-character dispatch from shell to daemon over socket |
| Audio capture | Direct PCM via cpal (no temp files, no subprocess) | Shells out to pw-record |
| Audio backends | PipeWire, PulseAudio, ALSA (auto-detected) | PipeWire only |
| Clipboard | Save/restore around paste operations | Uses wl-copy/xclip (no restore) |
| Backends | Groq, OpenAI Realtime, OpenAI REST, local whisper.cpp | Groq only |
| Streaming | OpenAI Realtime WebSocket (text as you speak) | Not supported |
Both projects use /dev/uinput for keyboard injection and wl-copy for Unicode clipboard paste. Both have a daemon for the uinput device. The architectural difference is that xhisper's orchestration (recording, API calls, text dispatch) happens in a bash script that chains together pw-record, curl, jq, and ffmpeg, while whisrs does everything in a single async Rust process.
Performance
whisrs is noticeably faster at typing transcribed text:
- Bulk typing — whisrs processes the full transcription in one pass. xhisper's shell iterates character-by-character, dispatching each to the daemon individually over a socket.
- Single process — xhisper's bash script spawns
pw-record,curl,jq,ffmpeg, andxhispertoolas separate processes. whisrs handles audio, HTTP, and typing in one binary. - Direct audio — cpal streams PCM into memory. No subprocess, no temp WAV files on disk.
- Async — tokio runtime handles audio capture, API calls, and text typing concurrently.
Quick Start
One-line install
&& &&
The install script handles everything:
- Installs system dependencies (detects your distro)
- Builds the project (all backends included — cloud and local)
- Installs
whisrsandwhisrsdto~/.cargo/bin/ - Runs interactive setup — pick your backend, enter API key or download a local model
- Fixes
/dev/uinputpermissions (asks for sudo) - Installs and enables the systemd service
- Adds a keybinding to your compositor (Hyprland/Sway auto-detected)
After install, press your hotkey to start recording, press again to stop. Text appears at your cursor.
Want to switch backends later? Just run whisrs setup again.
1. Dependencies
# Arch Linux
# Debian/Ubuntu
# Fedora
2. Build
This builds everything — cloud backends and local whisper.cpp support are all included in a single binary.
3. Setup
The interactive setup will walk you through backend selection, API keys / model download, microphone test, uinput permissions, systemd service, and keybindings.
4. Manual uinput permissions (if you skipped during setup)
# Log out and back in
5. Manual daemon start (if you skipped during setup)
# Foreground
# Background
&
# Systemd (recommended)
6. Bind a hotkey
Example for Hyprland (~/.config/hypr/hyprland.conf):
bind = $mainMod, W, exec, whisrs toggle
Example for Sway (~/.config/sway/config):
bindsym $mod+w exec whisrs toggle
Then: press hotkey to start recording, press again to stop and transcribe. Text appears at your cursor.
Transcription Backends
| Backend | Type | Streaming | Cost | Best for |
|---|---|---|---|---|
| Groq | Cloud (HTTP POST) | Batch | Free tier available | Getting started, budget use |
| OpenAI Realtime | Cloud (WebSocket) | True streaming | Paid | Best UX — text as you speak |
| OpenAI REST | Cloud (HTTP POST) | Batch | Paid | Simple fallback |
| Local whisper.cpp | Local (CPU/GPU) | Pseudo (sliding window) | Free | Privacy, offline use |
| Local Vosk | Local (CPU) | True streaming | Free | Coming soon |
| Local Parakeet | Local (NVIDIA) | True streaming | Free | Coming soon |
Groq is the default — fast, free tier, good accuracy with whisper-large-v3-turbo.
OpenAI Realtime is the premium option — true streaming over WebSocket means text appears at your cursor while you're still speaking.
Local whisper.cpp
Run transcription entirely on your machine — no API key, no internet, no data leaves your device. Local whisper support is included in every build — no special flags needed.
# Run setup — select Local > whisper.cpp, pick a model, download automatically
Models are downloaded from HuggingFace during setup:
| Model | Size | RAM | Speed (CPU) | Accuracy |
|---|---|---|---|---|
| tiny.en | 75 MB | ~273 MB | Real-time | Decent |
| base.en | 142 MB | ~388 MB | Real-time | Good (recommended) |
| small.en | 466 MB | ~852 MB | Borderline | Very good |
Streaming works via a sliding window approach: audio is processed in overlapping 8-second windows with prompt conditioning for consistency.
Configuration
Config file: ~/.config/whisrs/config.toml
[]
= "groq" # groq | openai-realtime | openai | local-whisper
= "en" # ISO 639-1 or "auto"
= 2000 # auto-stop after silence (streaming only)
= true # desktop notifications
[]
= "default"
[]
= "gsk_..."
= "whisper-large-v3-turbo"
[]
= "sk-..."
= "gpt-4o-mini-transcribe"
[]
= "~/.local/share/whisrs/models/ggml-base.en.bin"
Environment variable overrides: WHISRS_GROQ_API_KEY, WHISRS_OPENAI_API_KEY
CLI Commands
whisrs setup # Interactive onboarding
whisrs toggle # Start/stop recording
whisrs cancel # Cancel recording, discard audio
whisrs status # Query daemon state
Supported Environments
| Component | Support |
|---|---|
| Hyprland | Tested, full support |
| Sway / i3 | Implemented, needs community testing |
| X11 (any WM) | Implemented, needs community testing |
| GNOME Wayland | Limited — requires window-calls extension for window tracking |
| KDE Wayland | Implemented via D-Bus, needs community testing |
| Audio | PipeWire, PulseAudio, ALSA (auto-detected via cpal) |
| Distros | Any Linux with the system dependencies above |
Note: whisrs has been primarily tested on Hyprland (Arch Linux). Testing on other compositors and distros is a valuable contribution — if you run into issues, please open an issue.
How It Works
Hotkey press
|
v
whisrs toggle --> Unix socket --> whisrsd (daemon)
|
v
State: Idle -> Recording
|
cpal captures audio (16kHz mono)
|
Hotkey press again |
| v
v State: Recording -> Transcribing
whisrs toggle --> Unix socket |
v
Encode WAV -> Send to API -> Get text
|
v
Restore window focus (Hyprland IPC)
|
v
Type text via uinput (XKB layout-aware)
|
v
State: Transcribing -> Idle
See CONTRIBUTING.md for development setup and project structure.
Project Status
whisrs is functional and usable for daily dictation on Hyprland. The core features work:
- Daemon + CLI architecture
- Audio capture and WAV encoding
- Groq transcription backend
- OpenAI REST transcription backend
- OpenAI Realtime WebSocket backend (needs API key testing)
- Layout-aware keyboard injection (uinput + XKB)
- Wayland/X11 clipboard with save/restore
- Window tracking (Hyprland, Sway, X11, GNOME, KDE)
- Desktop notifications
- Interactive setup
- Error UX with actionable messages
- Local whisper.cpp backend (sliding window streaming, prompt conditioning, model download)
- Local Vosk backend (true streaming, tiny model)
- Local Parakeet backend (NVIDIA, ultra-fast streaming)
- OpenAI Realtime end-to-end testing
- Multi-compositor testing
- Filler word removal
- LLM command mode
- System tray indicator
- Packaging (AUR, Nix, static binaries)
Contributing
The biggest way to help right now:
- Test on your compositor — Sway, i3, KDE, GNOME. Report what works and what doesn't.
- Test on your distro — Ubuntu, Fedora, NixOS, etc. Build issues, missing deps, etc.
- Bug reports — if text goes to the wrong window, characters get dropped, or audio doesn't capture, open an issue.
License
MIT