_ _
__ __| |__ (_)___ _ __ ___
\ \ /\ / /| '_ \| / __|| '__/ __|
\ V V / | | | | \__ \| | \__ \
\_/\_/ |_| |_|_|___/|_| |___/
speak. type. done.
whisrs
Linux-first voice-to-text dictation, written in Rust.
Press a hotkey, speak, and your words appear at the cursor — in any app, any window manager, any desktop environment. Fast, private, open source.
Why whisrs?
This project is directly inspired by xhisper — a solid tool that proved Linux dictation works. whisrs takes that concept and rebuilds it in Rust with a proper architecture, native layout support, window tracking, and multiple transcription backends.
Landscape
| whisrs | xhisper | Wispr Flow | nerd-dictation | Superwhisper | |
|---|---|---|---|---|---|
| Platform | Linux | Linux | macOS, Windows, iOS, Android | Linux | macOS, Windows, iOS |
| Language | Rust | C + Shell | Proprietary | Python | Proprietary |
| Transcription | Groq, OpenAI, local whisper.cpp | Groq only | Cloud (proprietary) | Vosk (local) | Local Whisper + cloud |
| Streaming | Yes (OpenAI Realtime) | No | Yes | No | Yes |
| Offline | Yes (whisper.cpp) | No | No | Yes | Yes |
| Open source | Yes (MIT) | Yes | No | Yes (GPL) | No |
| Price | Free | Free | Free tier / $12/mo Pro | Free | $8.49/mo or $250 lifetime |
Also worth knowing about:
- Speech Note — Linux desktop app (Flatpak) with offline STT, TTS, and translation. Supports Vosk, whisper.cpp, Faster Whisper. GUI-focused, not a CLI tool.
- VoiceInk — macOS only, local Whisper, open source (GPL). No Linux.
What whisrs adds over xhisper
| whisrs | xhisper | |
|---|---|---|
| Keyboard layout | Automatic XKB reverse lookup — works natively on any layout | Hardcoded QWERTY keycodes — non-QWERTY requires an input-switch workaround (e.g. --rightalt to toggle OS layout to QWERTY) |
| Window tracking | Captures focused window on record start, restores focus before typing | None — text goes to whatever window is focused |
| Typing | Bulk text processing in one pass through uinput | Character-by-character dispatch from shell to daemon over socket |
| Audio capture | Direct PCM via cpal (no temp files, no subprocess) | Shells out to pw-record |
| Audio backends | PipeWire, PulseAudio, ALSA (auto-detected) | PipeWire only |
| Clipboard | Save/restore around paste operations | Uses wl-copy/xclip (no restore) |
| Backends | Groq, OpenAI Realtime, OpenAI REST, local whisper.cpp | Groq only |
| Streaming | OpenAI Realtime WebSocket (text as you speak) | Not supported |
Both projects use /dev/uinput for keyboard injection and wl-copy for Unicode clipboard paste. Both have a daemon for the uinput device. The architectural difference is that xhisper's orchestration (recording, API calls, text dispatch) happens in a bash script that chains together pw-record, curl, jq, and ffmpeg, while whisrs does everything in a single async Rust process.
Performance
whisrs is noticeably faster at typing transcribed text:
- Bulk typing — whisrs processes the full transcription in one pass. xhisper's shell iterates character-by-character, dispatching each to the daemon individually over a socket.
- Single process — xhisper's bash script spawns
pw-record,curl,jq,ffmpeg, andxhispertoolas separate processes. whisrs handles audio, HTTP, and typing in one binary. - Direct audio — cpal streams PCM into memory. No subprocess, no temp WAV files on disk.
- Async — tokio runtime handles audio capture, API calls, and text typing concurrently.
Installation
Quick install (any distro)
&& &&
The install script handles everything — detects your distro, installs system dependencies, builds the project, and runs interactive setup.
After install, press your hotkey to start recording, press again to stop. Text appears at your cursor.
After install, run whisrs setup to configure your backend, API keys, permissions, and keybindings.
Requires system dependencies: alsa-lib, libxkbcommon, clang, cmake.
After install, run whisrs setup.
Or add to your flake inputs:
inputs.whisrs.url = "github:y0sif/whisrs";
1. Dependencies
# Arch Linux
# Debian/Ubuntu
# Fedora
2. Build
This builds everything — cloud backends and local whisper.cpp support are all included in a single binary.
3. Setup
The interactive setup will walk you through backend selection, API keys / model download, microphone test, uinput permissions, systemd service, and keybindings.
4. Manual uinput permissions (if you skipped during setup)
# Log out and back in
5. Manual daemon start (if you skipped during setup)
# Foreground
# Background
&
# Systemd (recommended)
6. Bind a hotkey
Example for Hyprland (~/.config/hypr/hyprland.conf):
bind = $mainMod, W, exec, whisrs toggle
Example for Sway (~/.config/sway/config):
bindsym $mod+w exec whisrs toggle
Transcription Backends
| Backend | Type | Streaming | Cost | Best for |
|---|---|---|---|---|
| Groq | Cloud (HTTP POST) | Batch | Free tier available | Getting started, budget use |
| OpenAI Realtime | Cloud (WebSocket) | True streaming | Paid | Best UX — text as you speak |
| OpenAI REST | Cloud (HTTP POST) | Batch | Paid | Simple fallback |
| Local whisper.cpp | Local (CPU/GPU) | Pseudo (sliding window) | Free | Privacy, offline use |
| Local Vosk | Local (CPU) | True streaming | Free | Coming soon |
| Local Parakeet | Local (NVIDIA) | True streaming | Free | Coming soon |
Groq is the default — fast, free tier, good accuracy with whisper-large-v3-turbo.
OpenAI Realtime is the premium option — true streaming over WebSocket means text appears at your cursor while you're still speaking.
Local whisper.cpp
Run transcription entirely on your machine — no API key, no internet, no data leaves your device. Local whisper support is included in every build — no special flags needed.
# Run setup — select Local > whisper.cpp, pick a model, download automatically
Models are downloaded from HuggingFace during setup:
| Model | Size | RAM | Speed (CPU) | Accuracy |
|---|---|---|---|---|
| tiny.en | 75 MB | ~273 MB | Real-time | Decent |
| base.en | 142 MB | ~388 MB | Real-time | Good (recommended) |
| small.en | 466 MB | ~852 MB | Borderline | Very good |
Streaming works via a sliding window approach: audio is processed in overlapping 8-second windows with prompt conditioning for consistency.
Configuration
Config file: ~/.config/whisrs/config.toml
[]
= "groq" # groq | openai-realtime | openai | local-whisper
= "en" # ISO 639-1 or "auto"
= 2000 # auto-stop after silence (streaming only)
= true # desktop notifications
[]
= "default"
[]
= "gsk_..."
= "whisper-large-v3-turbo"
[]
= "sk-..."
= "gpt-4o-mini-transcribe"
[]
= "~/.local/share/whisrs/models/ggml-base.en.bin"
Environment variable overrides: WHISRS_GROQ_API_KEY, WHISRS_OPENAI_API_KEY
CLI Commands
whisrs setup # Interactive onboarding
whisrs toggle # Start/stop recording
whisrs cancel # Cancel recording, discard audio
whisrs status # Query daemon state
Supported Environments
| Component | Support |
|---|---|
| Hyprland | Tested, full support |
| Sway / i3 | Implemented, needs community testing |
| X11 (any WM) | Implemented, needs community testing |
| GNOME Wayland | Limited — requires window-calls extension for window tracking |
| KDE Wayland | Implemented via D-Bus, needs community testing |
| Audio | PipeWire, PulseAudio, ALSA (auto-detected via cpal) |
| Distros | Any Linux with the system dependencies above |
Note: whisrs has been primarily tested on Hyprland (Arch Linux). Testing on other compositors and distros is a valuable contribution — if you run into issues, please open an issue.
How It Works
Hotkey press
|
v
whisrs toggle --> Unix socket --> whisrsd (daemon)
|
v
State: Idle -> Recording
|
cpal captures audio (16kHz mono)
|
Hotkey press again |
| v
v State: Recording -> Transcribing
whisrs toggle --> Unix socket |
v
Encode WAV -> Send to API -> Get text
|
v
Restore window focus (Hyprland IPC)
|
v
Type text via uinput (XKB layout-aware)
|
v
State: Transcribing -> Idle
See CONTRIBUTING.md for development setup and project structure.
Troubleshooting
-
/dev/uinput permission denied -- Copy the udev rule and add yourself to the
inputgroup:&&Log out and back in for the group change to take effect.
-
No microphone detected -- Verify your mic is recognized:
arecord -l. If nothing shows up, make sure ALSA or PulseAudio/PipeWire is installed and your mic is not muted. On PipeWire systems, installpipewire-alsafor ALSA compatibility. -
API key errors (401 Unauthorized) -- Double-check your key is valid and not expired. Ensure the correct environment variable is set (
WHISRS_GROQ_API_KEYorWHISRS_OPENAI_API_KEY), or that the key in~/.config/whisrs/config.tomlis correct. Re-runwhisrs setupto reconfigure. -
Text goes to the wrong window -- whisrs captures the focused window when recording starts and restores focus before typing. This requires compositor support. See the Supported Environments table above. On GNOME Wayland, the
window-callsextension is required. -
Daemon not running -- Start the daemon manually (
whisrsd) or via systemd:If it fails, check logs with
journalctl --user -u whisrs.serviceor runRUST_LOG=debug whisrsdin the foreground. -
Model download fails (local whisper) -- If automatic download during
whisrs setupfails, download the model manually from HuggingFace:https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.binPlace it in
~/.local/share/whisrs/models/and updatemodel_pathin your config.
Project Status
whisrs is functional and usable for daily dictation on Hyprland. The core features work:
- Daemon + CLI architecture
- Audio capture and WAV encoding
- Groq transcription backend
- OpenAI REST transcription backend
- OpenAI Realtime WebSocket backend (needs API key testing)
- Layout-aware keyboard injection (uinput + XKB)
- Wayland/X11 clipboard with save/restore
- Window tracking (Hyprland, Sway, X11, GNOME, KDE)
- Desktop notifications
- Interactive setup
- Error UX with actionable messages
- Local whisper.cpp backend (sliding window streaming, prompt conditioning, model download)
- Local Vosk backend (true streaming, tiny model)
- Local Parakeet backend (NVIDIA, ultra-fast streaming)
- OpenAI Realtime end-to-end testing
- Multi-compositor testing
- Filler word removal
- LLM command mode
- System tray indicator
- Packaging (AUR, Nix flake, crates.io)
Contributing
The biggest way to help right now:
- Test on your compositor — Sway, i3, KDE, GNOME. Report what works and what doesn't.
- Test on your distro — Ubuntu, Fedora, NixOS, etc. Build issues, missing deps, etc.
- Bug reports — if text goes to the wrong window, characters get dropped, or audio doesn't capture, open an issue.
License
MIT