vnrit — Lightweight X11 WebRTC Streaming Server
vnrit streams an X11 desktop to one or more browsers over WebRTC with low-latency keyboard and mouse input forwarding. Designed for ARM Linux environments (Termux, Raspberry Pi, etc.) where hardware resources are constrained.
┌──────────────────────────────────────────────────────────────────┐
│ X11 Server ──→ ximagesrc ──→ videoconvert ──→ encoder │
│ (Xvnc/Xvfb) GStreamer pipeline │
│ │
│ ┌── WebSocket (signaling + input) │
│ Browser ←──WebRTC──┤ │
│ └── ICE/STUN/TURN (p2p media) │
└──────────────────────────────────────────────────────────────────┘
Features
- WebRTC streaming — low-latency video via
webrtcbinwith adaptive quality - Multiple codecs — openh264 (default), Android MediaCodec H.264, VP8, VP9
- Audio support — Opus audio via PulseAudio (auto-detected)
- Input forwarding — keyboard + mouse injected directly via X11 XTest extension (no
xdotool) - Touch-to-mouse — trackpad mode: tap, long-press, drag, scroll — all on a touchscreen
- Browser cursor overlay — synced cursor position without encoding the cursor into video
- Optional token-auth — passwordless access control with cookie-based sessions
Quick Start
# Recommended: hardware H.264, 720p, 500 kbps
# Open the printed URL (default http://0.0.0.0:8080) in a browser.
# Tap/click to send mouse and keyboard events back.
Installation
Prerequisites
- Rust 1.70+ (
rustupor system package) - GStreamer 1.22+ with plugins:
gstreamer,gst-plugins-base,gst-plugins-good,gst-plugins-badgst-plugins-ugly(for x264enc, optional)- openh264 (
gst-openh264or system package) - VP8/VP9 support via
gst-plugins-good(libvpx) - Android MediaCodec encoder (
mcencplugin, Termux only)
- X11 server (Xvnc, Xvfb, or real X display)
- pkg-config (for GStreamer build linkage)
Build
Termux (Android)
On Termux, install dependencies via apt:
The X11 display connection uses the Unix socket at
/data/data/com.termux/files/usr/tmp/.X11-unix/X<display>. vnrit
auto-detects this path.
Usage
Usage: vnrit [OPTIONS]
Options:
--display <DISPLAY> X11 display to capture [default: :1]
-p, --port <PORT> HTTP/WebSocket port [default: 8080]
--codec <CODEC> Video encoder: openh264, h264, vp8, vp9 [default: openh264]
--framerate <FPS> Capture framerate [default: 24]
--bitrate <KBPS> Target bitrate in kbps [default: 1000]
--height <PX> Downscale height (0 = native) [default: 0]
--token <TOKEN> Authentication token (optional, for access control)
-h, --help Print detailed help
-V, --version Print version
Examples
# Default: openh264 at desktop resolution, 1 Mbps
# Recommended: hardware H.264, 720p, 500 kbps
# Low bandwidth: VP9, 480p, 300 kbps
# High quality: no scaling, 2 Mbps
# Custom display and port
# With token authentication
# Full setup with auth + recommended codec
Token Authentication
When --token <TOKEN> is specified, all HTTP and WebSocket connections must present the token.
How it works:
Browser → http://host:8080/?token=xxx # initial visit with token
↓
Server: validates ?token=xxx, sets HttpOnly cookie
↓
Browser → ws://host:8080/ws?token=xxx # WebSocket upgrade (or via cookie)
↓
Server: validates token → WebRTC streaming begins
- First visit: append
?token=<value>to the URL - Subsequent visits: the browser's cookie handles authentication automatically
- No token: server returns HTTP 401 Unauthorized
- No
--tokenspecified: server operates in open-access mode (no auth)
Codec Comparison
Measured on Snapdragon 835 (Adreno 540) at 720p 500 kbps with a connected client:
| Codec | Element | RSS | Type | Notes |
|---|---|---|---|---|
| openh264 | openh264enc |
~50 MB | Software H.264 (Cisco) | Default, good balance |
| h264 | mcenc |
~48 MB | Hardware H.264 | Lowest CPU/memory |
| vp8 | vp8enc |
~64 MB | Software VP8 (libvpx) | Higher memory |
| vp9 | vp9enc |
~64 MB | Software VP9 (libvpx) | Better compression |
The hardware H.264 encoder (mcenc) uses the GPU's dedicated video encoding
block, consuming the least CPU and memory.
Bitrate Guidelines (720p @ 24 fps)
| Bitrate | Quality | Use Case |
|---|---|---|
| 300 kbps | Low | Text terminals, SSH-like |
| 500 kbps | Good | GUI desktops (recommended) |
| 1000 kbps | High | Default, smooth desktop |
| 2000+ kbps | Near-lossless | Static content, reading |
Architecture
Pipeline
The GStreamer pipeline is constructed dynamically per WebRTC connection:
Video:
ximagesrc → videoconvert → queue → capsfilter
↓ (optional)
videoscale → capsfilter (--height)
↓
encoder → payloader → webrtcbin
Audio (if PulseAudio detected):
pulsesrc → audio/x-raw (mono/48kHz) → opusenc → rtpopuspay → webrtcbin
Input Protocol
Keyboard and mouse events are sent from the browser to the server over the same WebSocket used for WebRTC signaling. Messages are CSV lines for minimal overhead:
| Command | Format | Description |
|---|---|---|
| Mouse move (relative) | mr,dx,dy |
Relative cursor movement |
| Mouse move (absolute) | ma,x,y |
Absolute cursor position |
| Mouse down | md,button |
Button press (1=left, 2=middle, 3=right) |
| Mouse up | mu,button |
Button release |
| Scroll | ms,deltaY |
Scroll wheel (positive=down, negative=up) |
| Key down | kd,code |
KeyboardEvent.code press |
| Key up | ku,code |
KeyboardEvent.code release |
Input is injected directly into X11 via the XTest extension — no xdotool,
no subprocess, no string parsing overhead.
Frontend
The built-in web UI provides:
- WebRTC video rendering via
RTCPeerConnection - Browser cursor overlay — a CSS-rendered cursor synced with the server position, so the system cursor (and its latency) is never encoded in the video
- Input throttling — relative mouse movements are accumulated and flushed at ~50fps to avoid flooding X11
- Touch-to-mouse translation:
- One-finger slide → relative cursor move
- Tap (<300ms) → left click
- Long-press (>700ms) → right click
- Long-press + vertical move → scroll
- Double-tap (<400ms) + hold + move → drag selection
- Keyboard forwarding — all keyboard events mapped to X11 keysyms
- Auto-reconnection — exponential backoff (1s → 30s max)
- Negotiation watchdog — 15s timeout on WebRTC connection
WebSocket Signaling
Client → Server: {"type":"ready"} ready to connect
Server → Client: {"type":"offer","sdp":"..."} SDP offer
Client → Server: {"type":"answer","sdp":"..."} SDP answer
Both: {"type":"ice","candidate":"...","sdp_mline_index":0}
After signaling completes, the WebSocket switches to carrying input CSV lines
and cursor position updates ({"type":"cursor","x":<x>,"y":<y>}).
Security
Token Authentication
--token <TOKEN> provides passwordless access control suitable for
internal/VPN networks:
| Measure | Detail |
|---|---|
| Cookie | HttpOnly + SameSite=Lax, 24h expiry |
| No server-side session | Stateless, no session leakage |
| Token in query param | Auto-converted to cookie on first successful auth |
| No token set | Server operates in open-access mode |
Limitations:
- Token is transmitted in plaintext on HTTP — use behind VPN or HTTPS for production
- Token is static — rotate by restarting with a new
--tokenvalue - No rate-limiting on auth attempts — use a long random token (>16 chars)
- Cookie
Secureflag not set (HTTP-only environments)
Recommendation: Pair with Tailscale/WireGuard VPN, or put behind nginx HTTPS reverse proxy for public-facing deployments.
Notes
- Each browser tab creates a separate WebRTC pipeline (no multi-viewer sharing yet). Multiple viewers work simultaneously.
- Requires a running X11 server (Xvnc, Xvfb, or real X).
- On Termux, the X socket path is auto-detected.
- Audio requires PulseAudio running on the system.
- Stale wineserver processes with ESYNC on Termux can cause
virtual_setup_exceptioncrashes — clear them withkill -9if needed. - Termux linker namespace restrictions require
LD_PRELOADtricks for GPU-accelerated encoding — see the proton11 guide for details.
License
MIT