_     _
 __      __| |__ (_)___  _ __ ___
 \ \ /\ / /| '_ \| / __|| '__/ __|
  \ V  V / | | | | \__ \| |  \__ \
   \_/\_/  |_| |_|_|___/|_|  |___/

  speak. type. done.

whisrs

Linux-first voice-to-text dictation, written in Rust.

Press a hotkey, speak, and your words appear at the cursor — in any app, any window manager, any desktop environment. Fast, private, open source.

Why whisrs?

This project is directly inspired by xhisper — a solid tool that proved Linux dictation works. whisrs takes that concept and rebuilds it in Rust with a proper architecture, native layout support, window tracking, and multiple transcription backends.

Landscape

	whisrs	xhisper	Wispr Flow	nerd-dictation	Superwhisper
Platform	Linux	Linux	macOS, Windows, iOS, Android	Linux	macOS, Windows, iOS
Language	Rust	C + Shell	Proprietary	Python	Proprietary
Transcription	Groq, OpenAI, local whisper.cpp	Groq only	Cloud (proprietary)	Vosk (local)	Local Whisper + cloud
Streaming	Yes (OpenAI Realtime)	No	Yes	No	Yes
Offline	Yes (whisper.cpp)	No	No	Yes	Yes
Open source	Yes (MIT)	Yes	No	Yes (GPL)	No
Price	Free	Free	Free tier / $12/mo Pro	Free	$8.49/mo or $250 lifetime

Also worth knowing about:

Speech Note — Linux desktop app (Flatpak) with offline STT, TTS, and translation. Supports Vosk, whisper.cpp, Faster Whisper. GUI-focused, not a CLI tool.
VoiceInk — macOS only, local Whisper, open source (GPL). No Linux.

What whisrs adds over xhisper

	whisrs	xhisper
Keyboard layout	Automatic XKB reverse lookup — works natively on any layout	Hardcoded QWERTY keycodes — non-QWERTY requires an input-switch workaround (e.g. `--rightalt` to toggle OS layout to QWERTY)
Window tracking	Captures focused window on record start, restores focus before typing	None — text goes to whatever window is focused
Typing	Bulk text processing in one pass through uinput	Character-by-character dispatch from shell to daemon over socket
Audio capture	Direct PCM via cpal (no temp files, no subprocess)	Shells out to `pw-record`
Audio backends	PipeWire, PulseAudio, ALSA (auto-detected)	PipeWire only
Clipboard	Save/restore around paste operations	Uses wl-copy/xclip (no restore)
Backends	Groq, OpenAI Realtime, OpenAI REST, local whisper.cpp	Groq only
Streaming	OpenAI Realtime WebSocket (text as you speak)	Not supported

Both projects use /dev/uinput for keyboard injection and wl-copy for Unicode clipboard paste. Both have a daemon for the uinput device. The architectural difference is that xhisper's orchestration (recording, API calls, text dispatch) happens in a bash script that chains together pw-record, curl, jq, and ffmpeg, while whisrs does everything in a single async Rust process.

Performance

whisrs is noticeably faster at typing transcribed text:

Bulk typing — whisrs processes the full transcription in one pass. xhisper's shell iterates character-by-character, dispatching each to the daemon individually over a socket.
Single process — xhisper's bash script spawns pw-record, curl, jq, ffmpeg, and xhispertool as separate processes. whisrs handles audio, HTTP, and typing in one binary.
Direct audio — cpal streams PCM into memory. No subprocess, no temp WAV files on disk.
Async — tokio runtime handles audio capture, API calls, and text typing concurrently.

Quick Start

One-line install

git clone https://github.com/y0sif/whisrs && cd whisrs && ./install.sh

The install script handles everything:

Installs system dependencies (detects your distro)
Builds the project (all backends included — cloud and local)
Installs whisrs and whisrsd to ~/.cargo/bin/
Runs interactive setup — pick your backend, enter API key or download a local model
Fixes /dev/uinput permissions (asks for sudo)
Installs and enables the systemd service
Adds a keybinding to your compositor (Hyprland/Sway auto-detected)

After install, press your hotkey to start recording, press again to stop. Text appears at your cursor.

Want to switch backends later? Just run whisrs setup again.

1. Dependencies

# Arch Linux
sudo pacman -S base-devel alsa-lib libxkbcommon clang cmake

# Debian/Ubuntu
sudo apt install build-essential libasound2-dev libxkbcommon-dev libclang-dev cmake

# Fedora
sudo dnf install gcc-c++ alsa-lib-devel libxkbcommon-devel clang-devel cmake

2. Build

git clone https://github.com/y0sif/whisrs
cd whisrs
cargo install --path .

This builds everything — cloud backends and local whisper.cpp support are all included in a single binary.

3. Setup

whisrs setup

The interactive setup will walk you through backend selection, API keys / model download, microphone test, uinput permissions, systemd service, and keybindings.

4. Manual uinput permissions (if you skipped during setup)

sudo cp contrib/99-whisrs.rules /etc/udev/rules.d/
sudo udevadm control --reload-rules
sudo udevadm trigger
sudo usermod -aG input $USER
# Log out and back in

5. Manual daemon start (if you skipped during setup)

# Foreground
whisrsd

# Background
whisrsd &

# Systemd (recommended)
cp contrib/whisrs.service ~/.config/systemd/user/
systemctl --user enable --now whisrs.service

6. Bind a hotkey

Example for Hyprland (~/.config/hypr/hyprland.conf):

bind = $mainMod, W, exec, whisrs toggle

Example for Sway (~/.config/sway/config):

bindsym $mod+w exec whisrs toggle

Then: press hotkey to start recording, press again to stop and transcribe. Text appears at your cursor.

Transcription Backends

Backend	Type	Streaming	Cost	Best for
Groq	Cloud (HTTP POST)	Batch	Free tier available	Getting started, budget use
OpenAI Realtime	Cloud (WebSocket)	True streaming	Paid	Best UX — text as you speak
OpenAI REST	Cloud (HTTP POST)	Batch	Paid	Simple fallback
Local whisper.cpp	Local (CPU/GPU)	Pseudo (sliding window)	Free	Privacy, offline use
Local Vosk	Local (CPU)	True streaming	Free	Coming soon
Local Parakeet	Local (NVIDIA)	True streaming	Free	Coming soon

Groq is the default — fast, free tier, good accuracy with whisper-large-v3-turbo.

OpenAI Realtime is the premium option — true streaming over WebSocket means text appears at your cursor while you're still speaking.

Local whisper.cpp

Run transcription entirely on your machine — no API key, no internet, no data leaves your device. Local whisper support is included in every build — no special flags needed.

# Run setup — select Local > whisper.cpp, pick a model, download automatically
whisrs setup

Models are downloaded from HuggingFace during setup:

Model	Size	RAM	Speed (CPU)	Accuracy
tiny.en	75 MB	~273 MB	Real-time	Decent
base.en	142 MB	~388 MB	Real-time	Good (recommended)
small.en	466 MB	~852 MB	Borderline	Very good

Streaming works via a sliding window approach: audio is processed in overlapping 8-second windows with prompt conditioning for consistency.

Configuration

Config file: ~/.config/whisrs/config.toml

[general]
backend = "groq"            # groq | openai-realtime | openai | local-whisper
language = "en"             # ISO 639-1 or "auto"
silence_timeout_ms = 2000   # auto-stop after silence (streaming only)
notify = true               # desktop notifications

[audio]
device = "default"

[groq]
api_key = "gsk_..."
model = "whisper-large-v3-turbo"

[openai]
api_key = "sk-..."
model = "gpt-4o-mini-transcribe"

[local-whisper]
model_path = "~/.local/share/whisrs/models/ggml-base.en.bin"

Environment variable overrides: WHISRS_GROQ_API_KEY, WHISRS_OPENAI_API_KEY

CLI Commands

whisrs setup     # Interactive onboarding
whisrs toggle    # Start/stop recording
whisrs cancel    # Cancel recording, discard audio
whisrs status    # Query daemon state

Supported Environments

Component	Support
Hyprland	Tested, full support
Sway / i3	Implemented, needs community testing
X11 (any WM)	Implemented, needs community testing
GNOME Wayland	Limited — requires `window-calls` extension for window tracking
KDE Wayland	Implemented via D-Bus, needs community testing
Audio	PipeWire, PulseAudio, ALSA (auto-detected via cpal)
Distros	Any Linux with the system dependencies above

Note: whisrs has been primarily tested on Hyprland (Arch Linux). Testing on other compositors and distros is a valuable contribution — if you run into issues, please open an issue.

How It Works

Hotkey press
    |
    v
whisrs toggle --> Unix socket --> whisrsd (daemon)
                                    |
                                    v
                              State: Idle -> Recording
                                    |
                              cpal captures audio (16kHz mono)
                                    |
Hotkey press again                  |
    |                               v
    v                         State: Recording -> Transcribing
whisrs toggle --> Unix socket       |
                                    v
                              Encode WAV -> Send to API -> Get text
                                    |
                                    v
                              Restore window focus (Hyprland IPC)
                                    |
                                    v
                              Type text via uinput (XKB layout-aware)
                                    |
                                    v
                              State: Transcribing -> Idle

See CONTRIBUTING.md for development setup and project structure.

Project Status

whisrs is functional and usable for daily dictation on Hyprland. The core features work:

Daemon + CLI architecture
Audio capture and WAV encoding
Groq transcription backend
OpenAI REST transcription backend
OpenAI Realtime WebSocket backend (needs API key testing)
Layout-aware keyboard injection (uinput + XKB)
Wayland/X11 clipboard with save/restore
Window tracking (Hyprland, Sway, X11, GNOME, KDE)
Desktop notifications
Interactive setup
Error UX with actionable messages
Local whisper.cpp backend (sliding window streaming, prompt conditioning, model download)
Local Vosk backend (true streaming, tiny model)
Local Parakeet backend (NVIDIA, ultra-fast streaming)
OpenAI Realtime end-to-end testing
Multi-compositor testing
Filler word removal
LLM command mode
System tray indicator
Packaging (AUR, Nix, static binaries)

Contributing

The biggest way to help right now:

Test on your compositor — Sway, i3, KDE, GNOME. Report what works and what doesn't.
Test on your distro — Ubuntu, Fedora, NixOS, etc. Build issues, missing deps, etc.
Bug reports — if text goes to the wrong window, characters get dropped, or audio doesn't capture, open an issue.

License

MIT

whisrs 0.1.0