vnrit — Lightweight X11 WebRTC Streaming Server
vnrit streams an X11 desktop to one or more browsers over WebRTC, with low-latency keyboard and mouse input forwarding. Designed for ARM Linux environments (Termux, Raspberry Pi, etc.) where hardware resources are constrained.
┌───────────────────────────────────────────────────────────────┐
│ X11 Server ──→ ximagesrc ──→ videoconvert ──→ encoder │
│ (Xvnc/Xvfb) GStreamer pipeline │
│ │
│ ┌── WebSocket (signaling + input) │
│ Browser ←──WebRTC──┤ │
│ └── ICE/STUN/TURN (p2p media) │
└───────────────────────────────────────────────────────────────┘
Quick Start
# Recommended: hardware H.264, 720p, 500 kbps
# Open the printed URL (default http://0.0.0.0:8080) in a browser.
# Tap/click to send mouse and keyboard events back.
Installation
Prerequisites
- Rust 1.70+ (
rustupor system package) - GStreamer 1.22+ with plugins:
gstreamer,gst-plugins-base,gst-plugins-good,gst-plugins-badgst-plugins-ugly(for x264enc, optional)- openh264 (
gst-openh264or system package) - VP8/VP9 support via
gst-plugins-good(libvpx) - Android MediaCodec encoder (
mcencplugin, Termux only)
- X11 server (Xvnc, Xvfb, or real X display)
- pkg-config (for GStreamer build linkage)
Build
&&
Termux (Android)
On Termux, install dependencies via apt:
The X11 display connection uses the Unix socket at
/data/data/com.termux/files/usr/tmp/.X11-unix/X<display>. vnrit
auto-detects this path.
Usage
Usage: vnrit [OPTIONS]
Options:
--display <DISPLAY> X11 display to capture [default: :1]
-p, --port <PORT> HTTP/WebSocket port [default: 8080]
--codec <CODEC> Video encoder: openh264, h264, vp8, vp9
[default: openh264]
--framerate <FPS> Capture framerate [default: 24]
--bitrate <KBPS> Target bitrate in kbps [default: 1000]
--height <PX> Downscale height (0 = native) [default: 0]
-h, --help Print detailed help
-V, --version Print version
Examples
# Default: openh264 at desktop resolution, 1 Mbps
# Recommended: hardware H.264, 720p, 500 kbps
# Low bandwidth: VP9, 480p, 300 kbps
# High quality: no scaling, 2 Mbps
# Custom display and port
Codec Comparison
Measured on Snapdragon 835 (Adreno 540) at 720p 500 kbps with a connected client:
| Codec | Element | RSS | Type |
|---|---|---|---|
| openh264 | openh264enc |
~50 MB | Software H.264 (Cisco) |
| h264 | mcenc |
~48 MB | Hardware H.264 (MediaCodec) |
| vp8 | vp8enc |
~64 MB | Software VP8 (libvpx) |
| vp9 | vp9enc |
~64 MB | Software VP9 (libvpx) |
The hardware H.264 encoder (mcenc) uses the GPU's dedicated video encoding
block, consuming the least CPU and memory. It is the recommended option on
devices that support it.
Bitrate Guidelines (720p @ 24 fps)
| Bitrate | Quality | Use Case |
|---|---|---|
| 300 kbps | Low | Text terminals, SSH-like |
| 500 kbps | Good | GUI desktops (recommended) |
| 1000 kbps | High | Default, smooth desktop |
| 2000+ kbps | Near-lossless | Static content, reading |
Architecture
Pipeline
The GStreamer pipeline is constructed dynamically per WebRTC connection:
ximagesrc → videoconvert → queue → capsfilter
↓ (optional)
videoscale → capsfilter (--height)
↓
encoder → payloader → webrtcbin
If PulseAudio is detected, an audio pipeline is also added:
pulsesrc → audio/x-raw → opusenc → rtpopuspay → webrtcbin
Input Protocol
Keyboard and mouse events are sent from the browser to the server over the same WebSocket used for WebRTC signaling. Messages are CSV lines:
mouse,<x>,<y>,<button>,<pressed>
x/y = absolute pixel coordinates on the display
button = 1 (left), 2 (middle), 3 (right)
pressed = 1 (down), 0 (up)
key,<keycode>,<pressed>
keycode = X11 keysym (see /usr/include/X11/keysymdef.h)
pressed = 1 (down), 0 (up)
Frontend
The built-in web UI (src/index.html) provides:
- WebRTC video rendering via
RTCPeerConnection - Touch-to-mouse translation: one-finger move, two-finger scroll, tap = click, long-press = right-click
- Keyboard input via
onkeydown/onkeyup - Auto-reconnection on disconnect
WebSocket Signaling
The signaling protocol uses JSON messages over the same WebSocket:
Client → Server: {"type":"ready"} # ready to connect
Server → Client: {"type":"offer","sdp":"..."} # SDP offer
Client → Server: {"type":"answer","sdp":"..."} # SDP answer
Both: {"type":"ice","candidate":"...","sdp_mline_index":0} # ICE candidates
After signaling completes, the WebSocket continues to carry input CSV lines.
Notes
- Each browser tab creates a separate WebRTC pipeline (no multi-viewer sharing yet). Multiple viewers work simultaneously.
- Requires a running X11 server (Xvnc, Xvfb, or real X).
- On Termux, the X socket path is auto-detected.
- Audio requires PulseAudio running on the system.
- Stale wineserver processes with ESYNC on Termux can cause
virtual_setup_exceptioncrashes — clear them withkill -9if needed.
License
MIT