vnrit 0.1.2

Lightweight X11 desktop WebRTC streaming server
vnrit-0.1.2 is not a library.

vnrit — Lightweight X11 WebRTC Streaming Server

License: MIT

vnrit streams an X11 desktop to one or more browsers over WebRTC with low-latency keyboard and mouse input forwarding. Designed for ARM Linux environments (Termux, Raspberry Pi, etc.) where hardware resources are constrained.

┌──────────────────────────────────────────────────────────────────┐
│  X11 Server  ──→  ximagesrc  ──→  videoconvert  ──→  encoder   │
│  (Xvnc/Xvfb)              GStreamer pipeline                    │
│                                                                │
│                     ┌── WebSocket (signaling + input)          │
│  Browser  ←──WebRTC──┤                                          │
│                     └── ICE/STUN/TURN (p2p media)              │
└──────────────────────────────────────────────────────────────────┘

Features

  • WebRTC streaming — low-latency video via webrtcbin with adaptive quality
  • Multiple codecs — openh264 (default), Android MediaCodec H.264, VP8, VP9
  • Audio support — Opus audio via PulseAudio (auto-detected)
  • Input forwarding — keyboard + mouse injected directly via X11 XTest extension (no xdotool)
  • Touch-to-mouse — trackpad mode: tap, long-press, drag, scroll — all on a touchscreen
  • Browser cursor overlay — synced cursor position without encoding the cursor into video
  • Optional token-auth — passwordless access control with cookie-based sessions

Quick Start

# Recommended: hardware H.264, 720p, 500 kbps
vnrit --codec h264 --height 720 --bitrate 500

# Open the printed URL (default http://0.0.0.0:8080) in a browser.
# Tap/click to send mouse and keyboard events back.

Installation

Prerequisites

  • Rust 1.70+ (rustup or system package)
  • GStreamer 1.22+ with plugins:
    • gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-bad
    • gst-plugins-ugly (for x264enc, optional)
    • openh264 (gst-openh264 or system package)
    • VP8/VP9 support via gst-plugins-good (libvpx)
    • Android MediaCodec encoder (mcenc plugin, Termux only)
  • X11 server (Xvnc, Xvfb, or real X display)
  • pkg-config (for GStreamer build linkage)

Build

git clone https://github.com/nlsidf/vnrit.git
cd vnrit
cargo build --release
./target/release/vnrit --help

Termux (Android)

On Termux, install dependencies via apt:

pkg install rust gstreamer gst-plugins-base gst-plugins-good \
  gst-plugins-bad gst-plugins-ugly openh264 mcenc x11-repo \
  tur-repo pulseaudio

The X11 display connection uses the Unix socket at /data/data/com.termux/files/usr/tmp/.X11-unix/X<display>. vnrit auto-detects this path.

Usage

Usage: vnrit [OPTIONS]

Options:
      --display <DISPLAY>    X11 display to capture [default: :1]
  -p, --port <PORT>          HTTP/WebSocket port [default: 8080]
      --codec <CODEC>        Video encoder: openh264, h264, vp8, vp9 [default: openh264]
      --framerate <FPS>      Capture framerate [default: 24]
      --bitrate <KBPS>       Target bitrate in kbps [default: 1000]
      --height <PX>          Downscale height (0 = native) [default: 0]
      --token <TOKEN>        Authentication token (optional, for access control)
  -h, --help                 Print detailed help
  -V, --version              Print version

Examples

# Default: openh264 at desktop resolution, 1 Mbps
vnrit

# Recommended: hardware H.264, 720p, 500 kbps
vnrit --codec h264 --height 720 --bitrate 500

# Low bandwidth: VP9, 480p, 300 kbps
vnrit --codec vp9 --height 480 --bitrate 300

# High quality: no scaling, 2 Mbps
vnrit --bitrate 2000

# Custom display and port
vnrit --display :0 -p 9090

# With token authentication
vnrit --token mysecret

# Full setup with auth + recommended codec
vnrit --token abc123 --codec h264 --height 720 --bitrate 500

Token Authentication

When --token <TOKEN> is specified, all HTTP and WebSocket connections must present the token.

How it works:

Browser → http://host:8080/?token=xxx    # initial visit with token
   ↓
Server: validates ?token=xxx, sets HttpOnly cookie
   ↓
Browser → ws://host:8080/ws?token=xxx    # WebSocket upgrade (or via cookie)
   ↓
Server: validates token → WebRTC streaming begins
  • First visit: append ?token=<value> to the URL
  • Subsequent visits: the browser's cookie handles authentication automatically
  • No token: server returns HTTP 401 Unauthorized
  • No --token specified: server operates in open-access mode (no auth)

Codec Comparison

Measured on Snapdragon 835 (Adreno 540) at 720p 500 kbps with a connected client:

Codec Element RSS Type Notes
openh264 openh264enc ~50 MB Software H.264 (Cisco) Default, good balance
h264 mcenc ~48 MB Hardware H.264 Lowest CPU/memory
vp8 vp8enc ~64 MB Software VP8 (libvpx) Higher memory
vp9 vp9enc ~64 MB Software VP9 (libvpx) Better compression

The hardware H.264 encoder (mcenc) uses the GPU's dedicated video encoding block, consuming the least CPU and memory.

Bitrate Guidelines (720p @ 24 fps)

Bitrate Quality Use Case
300 kbps Low Text terminals, SSH-like
500 kbps Good GUI desktops (recommended)
1000 kbps High Default, smooth desktop
2000+ kbps Near-lossless Static content, reading

Architecture

Pipeline

The GStreamer pipeline is constructed dynamically per WebRTC connection:

Video:
ximagesrc → videoconvert → queue → capsfilter
                                        ↓ (optional)
                              videoscale → capsfilter (--height)
                                        ↓
                              encoder → payloader → webrtcbin

Audio (if PulseAudio detected):
pulsesrc → audio/x-raw (mono/48kHz) → opusenc → rtpopuspay → webrtcbin

Input Protocol

Keyboard and mouse events are sent from the browser to the server over the same WebSocket used for WebRTC signaling. Messages are CSV lines for minimal overhead:

Command Format Description
Mouse move (relative) mr,dx,dy Relative cursor movement
Mouse move (absolute) ma,x,y Absolute cursor position
Mouse down md,button Button press (1=left, 2=middle, 3=right)
Mouse up mu,button Button release
Scroll ms,deltaY Scroll wheel (positive=down, negative=up)
Key down kd,code KeyboardEvent.code press
Key up ku,code KeyboardEvent.code release

Input is injected directly into X11 via the XTest extension — no xdotool, no subprocess, no string parsing overhead.

Frontend

The built-in web UI provides:

  • WebRTC video rendering via RTCPeerConnection
  • Browser cursor overlay — a CSS-rendered cursor synced with the server position, so the system cursor (and its latency) is never encoded in the video
  • Input throttling — relative mouse movements are accumulated and flushed at ~50fps to avoid flooding X11
  • Touch-to-mouse translation:
    • One-finger slide → relative cursor move
    • Tap (<300ms) → left click
    • Long-press (>700ms) → right click
    • Long-press + vertical move → scroll
    • Double-tap (<400ms) + hold + move → drag selection
  • Keyboard forwarding — all keyboard events mapped to X11 keysyms
  • Auto-reconnection — exponential backoff (1s → 30s max)
  • Negotiation watchdog — 15s timeout on WebRTC connection

WebSocket Signaling

Client → Server:  {"type":"ready"}                          ready to connect
Server → Client:  {"type":"offer","sdp":"..."}              SDP offer
Client → Server:  {"type":"answer","sdp":"..."}             SDP answer
Both:             {"type":"ice","candidate":"...","sdp_mline_index":0}

After signaling completes, the WebSocket switches to carrying input CSV lines and cursor position updates ({"type":"cursor","x":<x>,"y":<y>}).

Security

Token Authentication

--token <TOKEN> provides passwordless access control suitable for internal/VPN networks:

Measure Detail
Cookie HttpOnly + SameSite=Lax, 24h expiry
No server-side session Stateless, no session leakage
Token in query param Auto-converted to cookie on first successful auth
No token set Server operates in open-access mode

Limitations:

  • Token is transmitted in plaintext on HTTP — use behind VPN or HTTPS for production
  • Token is static — rotate by restarting with a new --token value
  • No rate-limiting on auth attempts — use a long random token (>16 chars)
  • Cookie Secure flag not set (HTTP-only environments)

Recommendation: Pair with Tailscale/WireGuard VPN, or put behind nginx HTTPS reverse proxy for public-facing deployments.

Notes

  • Each browser tab creates a separate WebRTC pipeline (no multi-viewer sharing yet). Multiple viewers work simultaneously.
  • Requires a running X11 server (Xvnc, Xvfb, or real X).
  • On Termux, the X socket path is auto-detected.
  • Audio requires PulseAudio running on the system.
  • Stale wineserver processes with ESYNC on Termux can cause virtual_setup_exception crashes — clear them with kill -9 if needed.
  • Termux linker namespace restrictions require LD_PRELOAD tricks for GPU-accelerated encoding — see the proton11 guide for details.

License

MIT