vnrit 0.2.0

Lightweight X11 desktop WebRTC streaming server (pure Rust, no GStreamer)
vnrit-0.2.0 is not a library.

vnrit — Pure Rust X11 WebRTC Streaming Server

License: MIT

vnrit streams an X11 desktop to browsers over WebRTC with low-latency keyboard/mouse input forwarding. Built entirely in Rust — no GStreamer, no FFmpeg, no system codec dependencies.

┌──────────────────────────────────────────────────────────────────┐
│                        vnrit Server                              │
│                                                                  │
│  X11 Server ──→ SHM Capture ──→ libyuv I420 ──→ openh264 H.264  │
│  (Xvnc/Xvfb)     (MIT-SHM)        (SIMD ARM NEON)  (Screen RT)  │
│                                                                  │
│  PulseAudio ──→ Opus Encoder ───→ WebRTC                         │
│  (monitor src)     (libopus)       (webrtc-rs)                   │
│                                                                  │
│                        ↓                                         │
│  Browser ←──WebRTC─── WebSocket (signaling + input) → XTest      │
└──────────────────────────────────────────────────────────────────┘

Features

  • Pure Rust — zero GStreamer/FFmpeg dependency, ~6.6 MB release binary
  • WebRTC H.264 — openh264 encoder with screen content optimization
  • SIMD color conversion — Google libyuv via ARM NEON
  • 4-stage pipeline — capture → convert → encode → send, fully parallel
  • Audio support — PulseAudio → Opus (48kHz stereo, 20ms frames)
  • X11 input injection — keyboard + mouse via XTest extension
  • MIT-SHM capture — zero-copy shared memory screen capture
  • Browser cursor overlay — CSS cursor synced separately, never encoded in video
  • Dual X11 connections — separate sockets for capture and input (no lock contention)
  • Memory pool reuse — zero per-frame allocations in steady state
  • Token authentication — optional passwordless access control
  • Auto-reconnection — exponential backoff (1s → 30s)
  • Touch-to-mouse — tap, long-press, drag, scroll on touchscreens
  • Virtual keyboard — on-screen keyboard for mobile/touch devices

Quick Start

# Build
./build.sh --release

# Run (default: X11 :1, port 8080, 1000 kbps)
target/release/vnrit --display :1

# Recommended settings for remote access
target/release/vnrit --display :1 --height 720 --bitrate 500

# Open http://<host>:8080 in a browser

Prerequisites

Dependency Purpose Install
Rust 1.82+ Compiler rustup or system package
cmake 3.20+ libyuv build apt install cmake / pkg install cmake
X11 server Display to capture Xvnc, Xvfb, or real X server
PulseAudio Audio capture (optional) pulseaudio server running

Termux (Android)

pkg install rust cmake x11-repo tur-repo pulseaudio

The X11 socket path at /data/data/com.termux/files/usr/tmp/.X11-unix/X<display> is auto-detected.

Build

# Using build script
./build.sh --release

# Manual
CMAKE=$(which cmake) cargo build --release

The CMAKE environment variable is required — shiguredo_libyuv's build system needs to find cmake on Android. Without it, the build.rs attempts to download a prebuilt cmake binary, which fails on Termux.

Git mirror for libyuv source

shiguredo_libyuv clones libyuv from chromium.googlesource.com during build. If that's blocked on your network, configure a mirror:

git config --global url."https://gitee.com/zhang_wang_wu/libyuv".insteadOf \
  "https://chromium.googlesource.com/libyuv/libyuv"

Usage

Usage: vnrit [OPTIONS]

Options:
      --display <DISPLAY>    X11 display to capture [default: :1]
  -p, --port <PORT>          HTTP/WebSocket listen port [default: 8080]
      --framerate <FPS>      Capture framerate [default: 24]
      --bitrate <KBPS>       Target bitrate in kbps [default: 1000]
      --height <PX>          Downscale height (0 = native) [default: 0]
      --stun <URL>           STUN server URL (empty to disable) [default: stun:stun.cloudflare.com:3478]
      --token <TOKEN>        Authentication token (optional)
      --log-level <LEVEL>    Log level: off, error, warn, info, debug, trace [default: warn]
  -h, --help                 Print help

Examples

# Basic: stream :1 at native resolution
vnrit

# Stream at 720p, 500 kbps (recommended for remote access)
vnrit --height 720 --bitrate 500

# Higher quality for LAN
vnrit --bitrate 2000

# Custom display and port
vnrit --display :0 -p 9090

# With authentication
vnrit --token mysecret

# Debug logging
vnrit --log-level debug

# Disable STUN (LAN-only)
vnrit --stun ""

Bitrate Guidelines (720p @ 24 fps)

Bitrate Quality Use Case
300 kbps Low Text terminals, SSH-like
500 kbps Good GUI desktops (recommended)
1000 kbps High Default, smooth desktop
2000+ kbps Near-lossless Static content, reading

Architecture

Video Pipeline

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Capture  │───→│ Convert  │───→│ Encode   │───→│ Send     │
│ SHM/X11  │    │ libyuv   │    │ openh264 │    │ WebRTC   │
│ BGRA     │    │ I420     │    │ H.264    │    │ track    │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
  spawn_         spawn_          spawn_          async
  blocking       blocking        blocking
  • Capture: X11 MIT-SHM extension reads screen pixels into shared memory (zero-copy). Falls back to get_image if SHM unavailable.
  • Convert: libyuv SIMD converts BGRA → I420. Supports I420-scaling for --height downscale.
  • Encode: openh264 H.264 encoder with ScreenContentRealTime profile and configurable bitrate.
  • Send: Asynchronously writes encoded frames to webrtc-rs TrackLocalStaticSample.

All 4 stages run in parallel connected by bounded channels (capacity 4). Each stage holds its own pre-allocated buffer pool.

Audio Pipeline

┌──────────────┐    ┌──────────┐    ┌──────────┐
│ PulseAudio   │───→│ Opus     │───→│ WebRTC   │
│ Simple API   │    │ Encoder  │    │ track    │
│ PCM S16LE    │    │ 48kHz    │    │ 20ms     │
│ 3840 B/frame │    │ stereo   │    │ frames   │
└──────────────┘    └──────────┘    └──────────┘

Audio is captured from the default PulseAudio sink monitor (system audio output). Falls back to default source (microphone) if monitor detection fails.

Input Protocol

Commands are sent from the browser over the WebSocket as CSV:

Command Format Description
Mouse move (relative) mr,dx,dy Relative cursor movement
Mouse move (absolute) ma,x,y Absolute cursor position
Mouse down md,button Button press (1=left, 2=middle, 3=right)
Mouse up mu,button Button release
Scroll ms,deltaY Scroll wheel
Key down kd,code KeyboardEvent.code (e.g. KeyA, Digit2)
Key up ku,code Key release

Keycodes are physical (not character-based), so keyboard layout handling is done by the X server. Shift+2 produces @ on a US layout, regardless of the browser's locale.

WebSocket Signaling

Client → Server:  {"type":"ready"}
Server → Client:  {"type":"offer","sdp":"..."}
Client → Server:  {"type":"answer","sdp":"..."}
Both:             {"type":"ice","candidate":"...", "sdp_mline_index":0}

Frontend

The built-in web UI (src/index.html) provides:

  • WebRTC video via RTCPeerConnection with H.264/Opus
  • CSS cursor overlay — synced to server cursor position, never encoded in video
  • Relative input throttling — accumulated via requestAnimationFrame, not setInterval
  • ResizeObserver — real-time container resize tracking (no layout thrash)
  • Touch-to-mouse: tap, long-press, drag, scroll
  • Virtual keyboard: main/func/num layers with modifier latching
  • Auto-reconnection with exponential backoff
  • Negotiation watchdog: 15s timeout

Cancellation & Cleanup

All pipeline tasks share a CancellationToken. On disconnect:

  1. cancel.cancel() signals all tasks
  2. Channel senders are dropped, waking blocked receivers
  3. Each task checks cancellation and exits cleanly
  4. ice_forward task exits naturally when its sender is dropped (no abort())
  5. All spawn_blocking handles are awaited

Performance Optimizations

Technique Detail
Memory pool reuse Pre-allocated Vecs per pipeline stage, no per-frame allocation
Zero-copy capture MIT-SHM shared memory (no X11 socket transfer for pixels)
SIMD color conversion libyuv ARGBToI420 + I420Scale via ARM NEON
Dual X11 connections Separate sockets for capture and input (no mutex)
I420-domain scaling Scale in YUV space (1.5 B/px vs 4 B/px for ARGB)
with_resize_uninit Skip zero-initialization for buffers immediately overwritten
SyncSender::send() Condition-variable-based blocking (no busy-wait)
CancellationToken Unified cancellation for blocking + async tasks
Atomic memory ordering Release/Acquire for ARM weak memory model
Release/Acquire Correctness on ARM (phone) vs x86
Repeat frame on error If capture fails, repeat last frame (prevents decoder crash)
Force keyframe on error If encode fails, reset encoder state immediately
try_send instead of blocking_send No thread pool deadlock risk

Resource Usage

Measured on Snapdragon 835 (Adreno 540) at 720p 500 kbps:

Metric Value
Binary size ~6.6 MB (release, stripped)
Memory (steady) ~50-80 MB RSS
CPU (video, 24 fps) 2-3 cores at ~1.5 GHz
CPU (audio, 20ms frames) <5% of one core
Network bandwidth ~500 kbps video + ~40 kbps audio

Troubleshooting

Connection fails

  1. Check X11 server is running: echo $DISPLAY
  2. Confirm XTest extension: xdpyinfo | grep XTest
  3. Try direct connection: vnrit --display :0 --stun ""

No audio

  1. Verify PulseAudio is running: pactl info
  2. Check default sink has a monitor: pactl list sinks short
  3. Set default source to monitor: pactl set-default-source <sink>.monitor

Build errors

Error Fix
cmake not found apt install cmake / pkg install cmake
libclang not found apt install libclang-dev / pkg install libclang
audiopus_sys build.rs Already patched in vendor/ — no action needed
chromium.googlesource.com timeout Configure git mirror (see Build section)

License

MIT