whisrs 0.1.3

Linux-first voice-to-text dictation tool

            _     _
 __      __| |__ (_)___  _ __ ___
 \ \ /\ / /| '_ \| / __|| '__/ __|
  \ V  V / | | | | \__ \| |  \__ \
   \_/\_/  |_| |_|_|___/|_|  |___/

  speak. type. done.

whisrs

Linux-first voice-to-text dictation, written in Rust.

Press a hotkey, speak, and your words appear at the cursor. Any app, any window manager, any desktop environment. Fast, private, open source.

Why whisrs?

Dictation tools like Wispr Flow and Superwhisper are not available on Linux. xhisper proved the concept works, but I kept running into limitations. whisrs takes that idea and rebuilds it in Rust as a single async process with native keyboard layout support, window tracking, and multiple transcription backends.

Installation

Quick install (any distro)

git clone https://github.com/y0sif/whisrs && cd whisrs && ./install.sh

The install script handles everything: detects your distro, installs system dependencies, builds the project, and runs interactive setup.

After install, press your hotkey to start recording, press again to stop. Text appears at your cursor.

Arch Linux (AUR)

yay -S whisrs-git

After install, run whisrs setup to configure your backend, API keys, permissions, and keybindings.

Cargo

cargo install whisrs

Requires system dependencies: alsa-lib, libxkbcommon, clang, cmake.

After install, run whisrs setup.

Nix

nix profile install github:y0sif/whisrs

Or add to your flake inputs:

inputs.whisrs.url = "github:y0sif/whisrs";

Manual install

1. Dependencies

# Arch Linux
sudo pacman -S base-devel alsa-lib libxkbcommon clang cmake

# Debian/Ubuntu
sudo apt install build-essential libasound2-dev libxkbcommon-dev libclang-dev cmake

# Fedora
sudo dnf install gcc-c++ alsa-lib-devel libxkbcommon-devel clang-devel cmake

2. Build

git clone https://github.com/y0sif/whisrs
cd whisrs
cargo install --path .

3. Setup

whisrs setup

The interactive setup will walk you through backend selection, API keys / model download, microphone test, uinput permissions, systemd service, and keybindings.

4. Bind a hotkey

Example for Hyprland (~/.config/hypr/hyprland.conf):

bind = $mainMod, W, exec, whisrs toggle

Example for Sway (~/.config/sway/config):

bindsym $mod+w exec whisrs toggle

Transcription Backends

Backend	Type	Streaming	Cost	Best for
Groq	Cloud	Batch	Free tier available	Getting started, budget use
OpenAI Realtime	Cloud (WebSocket)	True streaming	Paid	Best UX, text as you speak
OpenAI REST	Cloud	Batch	Paid	Simple fallback
Local whisper.cpp	Local (CPU/GPU)	Sliding window	Free	Privacy, offline use

Groq is the default. Fast, free tier, good accuracy with whisper-large-v3-turbo.

OpenAI Realtime is the premium option: true streaming over WebSocket means text appears at your cursor while you're still speaking.

Local whisper.cpp

Run transcription entirely on your machine. No API key, no internet, no data leaves your device. Included in every build.

whisrs setup   # select Local > whisper.cpp, pick a model, download automatically

Model	Size	RAM	Speed (CPU)	Accuracy
tiny.en	75 MB	~273 MB	Real-time	Decent
base.en	142 MB	~388 MB	Real-time	Good (recommended)
small.en	466 MB	~852 MB	Borderline	Very good

Configuration

Config file: ~/.config/whisrs/config.toml

[general]
backend = "groq"            # groq | openai-realtime | openai | local-whisper
language = "en"             # ISO 639-1 or "auto"
silence_timeout_ms = 2000   # auto-stop after silence (streaming only)
notify = true               # desktop notifications
remove_filler_words = true  # strip "um", "uh", "you know", etc.
filler_words = []           # custom list (empty = use built-in defaults)
audio_feedback = true       # play tones on record start/stop/done
audio_feedback_volume = 0.5 # 0.0 to 1.0

[audio]
device = "default"

[groq]
api_key = "gsk_..."
model = "whisper-large-v3-turbo"

[openai]
api_key = "sk-..."
model = "gpt-4o-mini-transcribe"

[local-whisper]
model_path = "~/.local/share/whisrs/models/ggml-base.en.bin"

Environment variable overrides: WHISRS_GROQ_API_KEY, WHISRS_OPENAI_API_KEY

CLI Commands

whisrs setup     # Interactive onboarding
whisrs toggle    # Start/stop recording
whisrs cancel    # Cancel recording, discard audio
whisrs status    # Query daemon state

Supported Environments

Component	Support
Hyprland	Tested, full support
Sway / i3	Implemented, needs community testing
X11 (any WM)	Implemented, needs community testing
GNOME Wayland	Limited, requires `window-calls` extension for window tracking
KDE Wayland	Implemented via D-Bus, needs community testing
Audio	PipeWire, PulseAudio, ALSA (auto-detected via cpal)
Distros	Any Linux with the system dependencies above

Note: whisrs has been primarily tested on Hyprland (Arch Linux). Testing on other compositors and distros is a valuable contribution. If you run into issues, please open an issue.

Project Status

whisrs is functional and usable for daily dictation. The core features work:

Daemon + CLI architecture
Audio capture and WAV encoding
Groq, OpenAI REST, and OpenAI Realtime backends
Local whisper.cpp backend (sliding window, prompt conditioning, model download)
Layout-aware keyboard injection (uinput + XKB)
Wayland/X11 clipboard with save/restore
Window tracking (Hyprland, Sway, X11, GNOME, KDE)
Desktop notifications and audio feedback
Interactive setup
Filler word removal
Packaging (AUR, Nix flake, crates.io)
Local Vosk backend
Local Parakeet backend (NVIDIA)
LLM command mode
System tray indicator

Troubleshooting

See docs/troubleshooting.md.

Contributing

The biggest way to help right now:

Test on your compositor — Sway, i3, KDE, GNOME. Report what works and what doesn't.
Test on your distro — Ubuntu, Fedora, NixOS, etc. Build issues, missing deps, etc.
Bug reports — if text goes to the wrong window, characters get dropped, or audio doesn't capture, open an issue.

See CONTRIBUTING.md for development setup and project structure.

License

MIT