_ _
__ __| |__ (_)___ _ __ ___
\ \ /\ / /| '_ \| / __|| '__/ __|
\ V V / | | | | \__ \| | \__ \
\_/\_/ |_| |_|_|___/|_| |___/
speak. type. done.
whisrs
Linux-first voice-to-text dictation, written in Rust.
Press a hotkey, speak, and your words appear at the cursor. Any app, any window manager, any desktop environment. Fast, private, open source.
Why whisrs?
Dictation tools like Wispr Flow and Superwhisper are not available on Linux. xhisper proved the concept works, but I kept running into limitations. whisrs takes that idea and rebuilds it in Rust as a single async process with native keyboard layout support, window tracking, and multiple transcription backends.
Installation
Quick install (any distro)
&& &&
The install script handles everything: detects your distro, installs system dependencies, builds the project, and runs interactive setup.
After install, press your hotkey to start recording, press again to stop. Text appears at your cursor.
Arch Linux (AUR)
After install, run whisrs setup to configure your backend, API keys, permissions, and keybindings.
Cargo
Requires system dependencies: alsa-lib, libxkbcommon, clang, cmake.
After install, run whisrs setup.
Nix
Or add to your flake inputs:
inputs.whisrs.url = "github:y0sif/whisrs";
Manual install
1. Dependencies
# Arch Linux
# Debian/Ubuntu
# Fedora
2. Build
3. Setup
The interactive setup will walk you through backend selection, API keys / model download, microphone test, uinput permissions, systemd service, and keybindings.
4. Bind a hotkey
Example for Hyprland (~/.config/hypr/hyprland.conf):
bind = $mainMod, W, exec, whisrs toggle
Example for Sway (~/.config/sway/config):
bindsym $mod+w exec whisrs toggle
Transcription Backends
| Backend | Type | Streaming | Cost | Best for |
|---|---|---|---|---|
| Groq | Cloud | Batch | Free tier available | Getting started, budget use |
| OpenAI Realtime | Cloud (WebSocket) | True streaming | Paid | Best UX, text as you speak |
| OpenAI REST | Cloud | Batch | Paid | Simple fallback |
| Local whisper.cpp | Local (CPU/GPU) | Sliding window | Free | Privacy, offline use |
Groq is the default. Fast, free tier, good accuracy with whisper-large-v3-turbo.
OpenAI Realtime is the premium option: true streaming over WebSocket means text appears at your cursor while you're still speaking.
Local whisper.cpp
Run transcription entirely on your machine. No API key, no internet, no data leaves your device. Included in every build.
| Model | Size | RAM | Speed (CPU) | Accuracy |
|---|---|---|---|---|
| tiny.en | 75 MB | ~273 MB | Real-time | Decent |
| base.en | 142 MB | ~388 MB | Real-time | Good (recommended) |
| small.en | 466 MB | ~852 MB | Borderline | Very good |
Configuration
Config file: ~/.config/whisrs/config.toml
[]
= "groq" # groq | openai-realtime | openai | local-whisper
= "en" # ISO 639-1 or "auto"
= 2000 # auto-stop after silence (streaming only)
= true # desktop notifications
= true # strip "um", "uh", "you know", etc.
= [] # custom list (empty = use built-in defaults)
= true # play tones on record start/stop/done
= 0.5 # 0.0 to 1.0
[]
= "default"
[]
= "gsk_..."
= "whisper-large-v3-turbo"
[]
= "sk-..."
= "gpt-4o-mini-transcribe"
[]
= "~/.local/share/whisrs/models/ggml-base.en.bin"
Environment variable overrides: WHISRS_GROQ_API_KEY, WHISRS_OPENAI_API_KEY
CLI Commands
whisrs setup # Interactive onboarding
whisrs toggle # Start/stop recording
whisrs cancel # Cancel recording, discard audio
whisrs status # Query daemon state
Supported Environments
| Component | Support |
|---|---|
| Hyprland | Tested, full support |
| Sway / i3 | Implemented, needs community testing |
| X11 (any WM) | Implemented, needs community testing |
| GNOME Wayland | Limited, requires window-calls extension for window tracking |
| KDE Wayland | Implemented via D-Bus, needs community testing |
| Audio | PipeWire, PulseAudio, ALSA (auto-detected via cpal) |
| Distros | Any Linux with the system dependencies above |
Note: whisrs has been primarily tested on Hyprland (Arch Linux). Testing on other compositors and distros is a valuable contribution. If you run into issues, please open an issue.
Project Status
whisrs is functional and usable for daily dictation. The core features work:
- Daemon + CLI architecture
- Audio capture and WAV encoding
- Groq, OpenAI REST, and OpenAI Realtime backends
- Local whisper.cpp backend (sliding window, prompt conditioning, model download)
- Layout-aware keyboard injection (uinput + XKB)
- Wayland/X11 clipboard with save/restore
- Window tracking (Hyprland, Sway, X11, GNOME, KDE)
- Desktop notifications and audio feedback
- Interactive setup
- Filler word removal
- Packaging (AUR, Nix flake, crates.io)
- Local Vosk backend
- Local Parakeet backend (NVIDIA)
- LLM command mode
- System tray indicator
Troubleshooting
Contributing
The biggest way to help right now:
- Test on your compositor — Sway, i3, KDE, GNOME. Report what works and what doesn't.
- Test on your distro — Ubuntu, Fedora, NixOS, etc. Build issues, missing deps, etc.
- Bug reports — if text goes to the wrong window, characters get dropped, or audio doesn't capture, open an issue.
See CONTRIBUTING.md for development setup and project structure.
License
MIT