vnrit 0.1.1

Lightweight X11 desktop WebRTC streaming server
vnrit-0.1.1 is not a library.

vnrit — Lightweight X11 WebRTC Streaming Server

vnrit streams an X11 desktop to one or more browsers over WebRTC, with low-latency keyboard and mouse input forwarding. Designed for ARM Linux environments (Termux, Raspberry Pi, etc.) where hardware resources are constrained.

┌───────────────────────────────────────────────────────────────┐
│  X11 Server  ──→  ximagesrc  ──→  videoconvert  ──→  encoder  │
│  (Xvnc/Xvfb)             GStreamer pipeline              │
│                                                            │
│                     ┌── WebSocket (signaling + input)      │
│  Browser  ←──WebRTC──┤                                     │
│                     └── ICE/STUN/TURN (p2p media)          │
└───────────────────────────────────────────────────────────────┘

Quick Start

# Recommended: hardware H.264, 720p, 500 kbps
vnrit --codec h264 --height 720 --bitrate 500

# Open the printed URL (default http://0.0.0.0:8080) in a browser.
# Tap/click to send mouse and keyboard events back.

Installation

Prerequisites

  • Rust 1.70+ (rustup or system package)
  • GStreamer 1.22+ with plugins:
    • gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-bad
    • gst-plugins-ugly (for x264enc, optional)
    • openh264 (gst-openh264 or system package)
    • VP8/VP9 support via gst-plugins-good (libvpx)
    • Android MediaCodec encoder (mcenc plugin, Termux only)
  • X11 server (Xvnc, Xvfb, or real X display)
  • pkg-config (for GStreamer build linkage)

Build

git clone <repo-url> && cd vnrit
cargo build --release
./target/release/vnrit --help

Termux (Android)

On Termux, install dependencies via apt:

pkg install rust gstreamer gst-plugins-base gst-plugins-good \
  gst-plugins-bad gst-plugins-ugly openh264 mcenc x11-repo \
  tur-repo pulseaudio

The X11 display connection uses the Unix socket at /data/data/com.termux/files/usr/tmp/.X11-unix/X<display>. vnrit auto-detects this path.

Usage

Usage: vnrit [OPTIONS]

Options:
      --display <DISPLAY>    X11 display to capture [default: :1]
  -p, --port <PORT>          HTTP/WebSocket port [default: 8080]
      --codec <CODEC>        Video encoder: openh264, h264, vp8, vp9
                              [default: openh264]
      --framerate <FPS>      Capture framerate [default: 24]
      --bitrate <KBPS>       Target bitrate in kbps [default: 1000]
      --height <PX>          Downscale height (0 = native) [default: 0]
  -h, --help                 Print detailed help
  -V, --version              Print version

Examples

# Default: openh264 at desktop resolution, 1 Mbps
vnrit

# Recommended: hardware H.264, 720p, 500 kbps
vnrit --codec h264 --height 720 --bitrate 500

# Low bandwidth: VP9, 480p, 300 kbps
vnrit --codec vp9 --height 480 --bitrate 300

# High quality: no scaling, 2 Mbps
vnrit --bitrate 2000

# Custom display and port
vnrit --display :0 -p 9090

Codec Comparison

Measured on Snapdragon 835 (Adreno 540) at 720p 500 kbps with a connected client:

Codec Element RSS Type
openh264 openh264enc ~50 MB Software H.264 (Cisco)
h264 mcenc ~48 MB Hardware H.264 (MediaCodec)
vp8 vp8enc ~64 MB Software VP8 (libvpx)
vp9 vp9enc ~64 MB Software VP9 (libvpx)

The hardware H.264 encoder (mcenc) uses the GPU's dedicated video encoding block, consuming the least CPU and memory. It is the recommended option on devices that support it.

Bitrate Guidelines (720p @ 24 fps)

Bitrate Quality Use Case
300 kbps Low Text terminals, SSH-like
500 kbps Good GUI desktops (recommended)
1000 kbps High Default, smooth desktop
2000+ kbps Near-lossless Static content, reading

Architecture

Pipeline

The GStreamer pipeline is constructed dynamically per WebRTC connection:

ximagesrc → videoconvert → queue → capsfilter
                                        ↓ (optional)
                              videoscale → capsfilter (--height)
                                        ↓
                              encoder → payloader → webrtcbin

If PulseAudio is detected, an audio pipeline is also added:

pulsesrc → audio/x-raw → opusenc → rtpopuspay → webrtcbin

Input Protocol

Keyboard and mouse events are sent from the browser to the server over the same WebSocket used for WebRTC signaling. Messages are CSV lines:

mouse,<x>,<y>,<button>,<pressed>
  x/y     = absolute pixel coordinates on the display
  button  = 1 (left), 2 (middle), 3 (right)
  pressed = 1 (down), 0 (up)

key,<keycode>,<pressed>
  keycode = X11 keysym (see /usr/include/X11/keysymdef.h)
  pressed = 1 (down), 0 (up)

Frontend

The built-in web UI (src/index.html) provides:

  • WebRTC video rendering via RTCPeerConnection
  • Touch-to-mouse translation: one-finger move, two-finger scroll, tap = click, long-press = right-click
  • Keyboard input via onkeydown/onkeyup
  • Auto-reconnection on disconnect

WebSocket Signaling

The signaling protocol uses JSON messages over the same WebSocket:

Client → Server:  {"type":"ready"}                        # ready to connect
Server → Client:  {"type":"offer","sdp":"..."}             # SDP offer
Client → Server:  {"type":"answer","sdp":"..."}            # SDP answer
Both:             {"type":"ice","candidate":"...","sdp_mline_index":0}  # ICE candidates

After signaling completes, the WebSocket continues to carry input CSV lines.

Notes

  • Each browser tab creates a separate WebRTC pipeline (no multi-viewer sharing yet). Multiple viewers work simultaneously.
  • Requires a running X11 server (Xvnc, Xvfb, or real X).
  • On Termux, the X socket path is auto-detected.
  • Audio requires PulseAudio running on the system.
  • Stale wineserver processes with ESYNC on Termux can cause virtual_setup_exception crashes — clear them with kill -9 if needed.

License

MIT