whisp-rs 0.1.2

Lightweight voice-to-text dictation for Linux (Wayland/COSMIC). Hold a hotkey, speak, release — text appears.
whisp-rs-0.1.2 is not a library.

whisp-rs

Lightweight voice-to-text dictation for Linux (Wayland/COSMIC). Hold a hotkey, speak, release — text appears at your cursor.

Single binary. No local models. No bloat.

Made with Rust License: MIT


How it works

  1. Hold Ctrl+Space
  2. Speak into your microphone
  3. Release — your words are typed at the cursor

Audio is captured via ALSA, sent to Deepgram for transcription, and injected into the focused window using wtype or clipboard paste.

Features

  • Global hotkey via evdev — works in any app, any workspace
  • System tray icon shows state: idle / recording / processing
  • Multiple injection methods: wtype → clipboard+paste → ydotool (auto-fallback)
  • First-run setup wizard — no manual config editing needed
  • Raw PCM streaming — no WAV overhead, minimal latency
  • Configurable: model, language, hotkey, sample rate
  • Tiny binary (~2MB release build with LTO)

Requirements

Dependency Purpose Install
arecord Audio capture (ALSA) sudo apt install alsa-utils
wtype or wl-clipboard Text injection sudo apt install wtype or sudo apt install wl-clipboard
ydotool Fallback injection sudo apt install ydotool
input group Hotkey access sudo usermod -aG input $USER && reboot

OS: Linux with Wayland (tested on Pop!_OS COSMIC, should work on GNOME, KDE, Sway, Hyprland)

Install

One-shot installer (Debian/Ubuntu/Pop!_OS)

No Rust required. Downloads the pre-built binary and installs all system deps:

curl -sSL https://raw.githubusercontent.com/Anes201/whisp-rs/main/install.sh | bash

Pre-built binary

Download from GitHub Releases, extract, and run:

tar xzf whisp-rs-x86_64-linux.tar.gz
sudo mv whisp-rs /usr/local/bin/
whisp-rs --setup

From crates.io (requires Rust)

cargo install whisp-rs

From source

git clone https://github.com/anes201/whisp-rs.git
cd whisp-rs
cargo build --release
# Binary at: target/release/whisp-rs

Setup

First run (automatic wizard)

whisp-rs

If no config exists, the setup wizard launches automatically. It asks for:

  • Hotkey (default: Ctrl+Space)
  • Deepgram API key (get one free — $200 credit)
  • STT model (default: nova-2-general)
  • Language

Manual setup

# Set API key only
whisp-rs --set-api-key YOUR_DEEPGRAM_API_KEY

# Re-run full wizard
whisp-rs --setup

Environment variable

export DEEPGRAM_API_KEY="your-key-here"
whisp-rs

Configuration

Config file: ~/.config/whisp-rs/config.toml

[hotkey]
modifiers = ["ctrl"]        # ctrl | super | alt | shift (or combo: ["ctrl", "shift"])
key = "space"

[stt]
api_key = "your-deepgram-api-key"
model = "nova-2-general"    # nova-2-general | nova-2 | base
language = "en"             # en, fr, es, de, ar, zh, ja, ko, auto

[audio]
sample_rate = 16000
channels = 1

Models

Model Speed Accuracy Best for
nova-2-general Fastest Good Dictation (default)
nova-2 Fast Best Accuracy-critical
base Fastest Lower Quick notes

CLI flags

whisp-rs                    # Start dictation daemon
whisp-rs --set-api-key KEY  # Save API key to config
whisp-rs --setup            # Re-run setup wizard
whisp-rs --help             # Show help

Logging

# Default: info level
whisp-rs

# Debug: see API calls, hotkey events, injection attempts
RUST_LOG=whisp_rs=debug whisp-rs

# Trace: everything
RUST_LOG=whisp_rs=trace whisp-rs

Troubleshooting

"No keyboard input devices found"

sudo usermod -aG input $USER
# Log out and back in (or reboot)

"All injection methods failed"

# Install wtype (recommended for Wayland)
sudo apt install wtype

# Or install wl-clipboard + ydotool
sudo apt install wl-clipboard ydotool
ydotoold &

"Missing dependencies: arecord"

sudo apt install alsa-utils

Empty transcriptions

  • Check your microphone is connected and not muted
  • Run with RUST_LOG=whisp_rs=debug to see audio capture sizes
  • Test mic: arecord -f S16_LE -r 16000 -c 1 -t raw -d 5 /tmp/test.raw && aplay -f S16_LE -r 16000 -c 1 /tmp/test.raw

Architecture

┌─────────────┐     ┌──────────┐     ┌─────────────┐
│  evdev       │────▶│  arecord │────▶│  Deepgram   │
│  hotkey      │     │  (ALSA)  │     │  API        │
└─────────────┘     └──────────┘     └──────┬──────┘
                                            │
┌─────────────┐     ┌──────────┐            │
│  ksni        │     │  wtype / │◀───────────┘
│  tray icon   │     │  inject  │
└─────────────┘     └──────────┘
  • Hotkey: evdev reads raw keyboard events from /dev/input/
  • Audio: arecord subprocess captures raw PCM (16kHz, 16-bit, mono)
  • STT: Deepgram REST API with raw PCM streaming (no WAV overhead)
  • Injection: wtypewl-copy+ydotoolydotool type (auto-fallback)
  • Tray: ksni system tray with state indicators

Support

If this project saves you time, consider buying me a coffee:

ko-fi

License

MIT