sockudo-ws
Ultra-low latency WebSocket library for Rust, designed for high-frequency trading (HFT) applications and real-time systems. Fully compatible with Tokio and Axum.
Will be used in Sockudo, a high-performance Pusher-compatible WebSocket server.
Coming soon: N-API bindings for Node.js
Performance
Rust WebSocket Libraries Benchmark
Benchmarked using web-socket-benchmark (100,000 iterations of "Hello, World!" message):
| Library | Send | Echo | Recv | Total |
|---|---|---|---|---|
| sockudo-ws | 1.2ms | 5.0ms | 3.1ms | 10.2ms |
| fastwebsockets | 3.3ms | 5.7ms | 3.0ms | 12.0ms |
| web-socket | 2.1ms | 6.8ms | 3.3ms | 12.2ms |
| soketto | 5.8ms | 17.6ms | 9.7ms | 33.1ms |
| tokio-tungstenite | 6.4ms | 18.2ms | 10.2ms | 34.8ms |
sockudo-ws is ~17% faster than the next fastest Rust WebSocket library!
# Clone the benchmark repository
# Run benchmarks
The benchmark measures:
- Send: Time to send 100,000 "Hello, World!" messages from client to server
- Echo: Time to send and receive 100,000 messages (round-trip)
- Recv: Time to receive 100,000 messages from server to client
Environment: AMD Ryzen 9 7950X, 32GB RAM, Linux 6.18, Rust 1.82
vs uWebSockets (C++)
Benchmarked against uWebSockets, the industry standard for high-performance WebSockets:
| Test Case | sockudo-ws | uWebSockets | Ratio |
|---|---|---|---|
| 512 bytes, 100 connections | 232,712 msg/s | 227,973 msg/s | 1.02x |
| 1024 bytes, 100 connections | 232,072 msg/s | 224,498 msg/s | 1.03x |
| 512 bytes, 500 connections | 231,135 msg/s | 222,493 msg/s | 1.03x |
| 1024 bytes, 500 connections | 222,578 msg/s | 216,833 msg/s | 1.02x |
sockudo-ws benchmark:
&
# Using websocket-bench (https://github.com/anycable/websocket-bench)
uWebSockets benchmark:
# Build uWebSockets echo server
&
# Run same benchmark
Environment: AMD Ryzen 9 7950X, 32GB RAM, Linux 6.18, Rust 1.82, uWebSockets v20.64
sockudo-ws matches or exceeds uWebSockets performance while providing a safe, ergonomic Rust API.
Features
- SIMD Acceleration: AVX2/AVX-512/SSE2/NEON/AltiVec/LSX for frame masking and UTF-8 validation
- Zero-Copy Parsing: Direct buffer access without intermediate allocations
- Write Batching (Corking): Minimizes syscalls via vectored I/O
- permessage-deflate: Full compression support with shared/dedicated compressors
- Split Streams: Concurrent read/write from separate tasks
- HTTP/2 WebSocket: RFC 8441 Extended CONNECT protocol support
- HTTP/3 WebSocket: RFC 9220 WebSocket over QUIC support
- io_uring: Linux high-performance async I/O (combinable with HTTP/2 and HTTP/3)
- Autobahn Compliant: Passes all 517 Autobahn test suite cases
- Fuzz Tested: Comprehensive fuzzing with libFuzzer
Installation
Add to your Cargo.toml:
[]
= { = "https://github.com/RustNSparks/sockudo-ws" }
# With compression
= { = "https://github.com/RustNSparks/sockudo-ws", = ["permessage-deflate"] }
# With HTTP/2 support
= { = "https://github.com/RustNSparks/sockudo-ws", = ["http2"] }
# With HTTP/3 support
= { = "https://github.com/RustNSparks/sockudo-ws", = ["http3"] }
# With io_uring (Linux only)
= { = "https://github.com/RustNSparks/sockudo-ws", = ["io-uring"] }
# With TLS (rustls)
= { = "https://github.com/RustNSparks/sockudo-ws", = ["rustls-webpki-roots"] }
# With TLS (native-tls)
= { = "https://github.com/RustNSparks/sockudo-ws", = ["native-tls"] }
# All transports
= { = "https://github.com/RustNSparks/sockudo-ws", = ["all-transports"] }
# Everything
= { = "https://github.com/RustNSparks/sockudo-ws", = ["full"] }
# With mimalloc allocator (recommended for production)
= { = "https://github.com/RustNSparks/sockudo-ws", = ["mimalloc"] }
Quick Start
Simple Echo Server
use ;
use ;
use TcpStream;
async
Split Streams (Concurrent Read/Write)
use ;
use mpsc;
async
Axum Integration
use ;
use ;
use TokioIo;
use ;
async
HTTP/2 WebSocket (RFC 8441)
HTTP/2 WebSocket uses the Extended CONNECT protocol for multiplexed WebSocket streams over a single TCP connection.
use ;
use ;
async
HTTP/2 Client
use ;
let client = new;
let mut ws = client.connect.await?;
ws.send.await?;
HTTP/2 Multiplexed Connections
Open multiple WebSocket streams over a single HTTP/2 connection:
use ;
let client = new;
let mut conn = client.connect_multiplexed.await?;
// Open multiple WebSocket streams on the same connection
let mut ws1 = conn.open_websocket.await?;
let mut ws2 = conn.open_websocket.await?;
HTTP/3 WebSocket (RFC 9220)
HTTP/3 WebSocket runs over QUIC, providing benefits like 0-RTT, no head-of-line blocking, and better mobile performance.
use ;
use ;
async
HTTP/3 Benefits
| Feature | Benefit |
|---|---|
| No head-of-line blocking | One slow stream doesn't block others |
| 0-RTT connection resumption | Faster reconnections |
| Better mobile performance | Handles network changes gracefully |
| Multiple streams per connection | Efficient multiplexing |
io_uring Support (Linux)
io_uring provides kernel-level async I/O with zero-copy operations. It's a transport layer that can be combined with any protocol.
io_uring with HTTP/1.1
use UringStream;
use ;
async
io_uring with HTTP/2
Combine io_uring transport with HTTP/2 protocol for maximum performance:
use UringStream;
use ;
async
The io_uring + HTTP/2 Stack
┌─────────────────────────────┐
│ WebSocket Messages │ ← Your application code
├─────────────────────────────┤
│ WebSocketStream<H2Stream> │ ← sockudo-ws
├─────────────────────────────┤
│ HTTP/2 (h2 crate) │ ← Extended CONNECT framing
├─────────────────────────────┤
│ TLS (rustls/openssl) │ ← Required for HTTP/2
├─────────────────────────────┤
│ UringStream │ ← io_uring async I/O
├─────────────────────────────┤
│ TCP (kernel) │ ← io_uring submission queue
└─────────────────────────────┘
Unified API
All transports use the same WebSocketStream<S> API:
// HTTP/1.1 (default)
let ws = server;
// HTTP/2
let ws = server;
// HTTP/3
let ws = server;
// io_uring
let ws = server;
// Same message loop for all!
while let Some = ws.next.await
Configuration
Basic Configuration
use ;
let config = builder
.compression // SHARED_COMPRESSOR
.max_payload_length // 16KB max message
.idle_timeout // 10 second timeout
.max_backpressure // 1MB backpressure limit
.build;
// Or use uWebSockets-style defaults
let config = uws_defaults;
HTTP/2 Configuration
let config = builder
.http2_window_size // 1MB stream window
.http2_connection_window_size // 2MB connection window
.http2_max_streams // Max concurrent streams
.build;
HTTP/3 Configuration
let config = builder
.http3_idle_timeout // 30 second idle timeout
.build;
Configuration Options
| Option | Default | Description |
|---|---|---|
compression |
Disabled |
Compression mode |
max_message_size |
64MB | Maximum message size |
max_frame_size |
16MB | Maximum single frame size |
idle_timeout |
120s | Close connection after inactivity (0 = disabled) |
max_backpressure |
1MB | Max write buffer before dropping connection |
auto_ping |
true | Automatic ping/pong keepalive |
ping_interval |
30s | Seconds between pings |
write_buffer_size |
16KB | Cork buffer size |
Compression Modes
| Mode | Description |
|---|---|
Compression::Disabled |
No compression |
Compression::Dedicated |
Per-connection compressor (best ratio, more memory) |
Compression::Shared |
Shared compressor (good for many connections) |
Compression::Shared4KB |
Shared with 4KB sliding window |
Compression::Shared8KB |
Shared with 8KB sliding window |
Compression::Shared16KB |
Shared with 16KB sliding window |
Feature Flags
Core Features
| Feature | Default | Description |
|---|---|---|
simd |
✅ | SIMD acceleration for masking and UTF-8 |
tokio-runtime |
✅ | Tokio async runtime support |
permessage-deflate |
✅ | Compression support (RFC 7692) |
fastrand |
✅ | Fast PRNG for client mask generation |
SIMD Features
| Feature | Description |
|---|---|
avx2 |
Enable AVX2 (256-bit SIMD) |
avx512 |
Enable AVX-512 (512-bit SIMD) |
neon |
Enable ARM NEON |
nightly |
Enable additional SIMD on arm, loongarch64, powerpc, s390x |
TLS Features
| Feature | Description |
|---|---|
native-tls |
TLS via tokio-native-tls |
rustls-webpki-roots |
TLS via tokio-rustls with webpki-roots |
rustls-native-roots |
TLS via tokio-rustls with native root certificates |
rustls-platform-verifier |
TLS via tokio-rustls with platform verifier |
SHA-1 Implementations
At least one SHA-1 implementation is required for the WebSocket handshake:
| Feature | Description |
|---|---|
ring |
SHA-1 via ring (recommended with rustls) |
aws_lc_rs |
SHA-1 via AWS LC |
openssl |
SHA-1 via OpenSSL (recommended with native-tls) |
sha1_smol |
Pure Rust SHA-1 fallback |
Random Number Generators
For client mask generation:
| Feature | Description |
|---|---|
fastrand |
Fast PRNG (default) |
getrandom |
Cryptographically secure RNG |
rand_rng |
Use rand crate |
Transport Features
| Feature | Description |
|---|---|
http2 |
HTTP/2 WebSocket (RFC 8441) |
http3 |
HTTP/3 WebSocket (RFC 9220) |
io-uring |
Linux io_uring support |
all-transports |
All transport features |
Allocator Features
| Feature | Description |
|---|---|
mimalloc |
Use mimalloc as global allocator (10-30% throughput improvement) |
Integration Features
| Feature | Description |
|---|---|
axum-integration |
Axum web framework support |
full |
All features enabled |
SIMD Architecture Support
sockudo-ws uses SIMD acceleration for frame masking and UTF-8 validation:
| Architecture | Instructions | Masking | UTF-8 | Stable | Nightly |
|---|---|---|---|---|---|
| x86_64 | SSE2 | ✅ | ❌ | ✅ | ✅ |
| x86_64 | SSE4.2 | ✅ | ✅ | ✅ | ✅ |
| x86_64 | AVX2 | ✅ | ✅ | ✅ | ✅ |
| x86_64 | AVX-512 | ✅ | ✅ | ✅ | ✅ |
| aarch64 | NEON | ✅ | ✅ | ✅ | ✅ |
| arm | NEON | ✅ | ✅ | ❌ | ✅ |
| loongarch64 | LSX | ✅ | ✅* | ❌ | ✅ |
| loongarch64 | LASX | ✅ | ✅* | ❌ | ✅ |
| powerpc | AltiVec | ✅ | ✅* | ❌ | ✅ |
| powerpc64 | AltiVec | ✅ | ✅* | ❌ | ✅ |
| s390x | z13 vectors | ✅ | ✅* | ❌ | ✅ |
*Custom SIMD UTF-8 validation with ASCII fast-path (requires nightly feature).
UTF-8 validation uses:
- simdutf8 for x86_64 (SSE4.2, AVX2, AVX-512), aarch64 (NEON), arm (NEON), wasm32
- Custom SIMD implementations for LoongArch64, PowerPC, and s390x (with
nightlyfeature)
API Reference
WebSocketStream
The main WebSocket type implementing Stream + Sink:
// Create server-side stream
let ws = server;
// Create client-side stream
let ws = client;
// Send messages
ws.send.await?;
ws.send.await?;
// Receive messages
while let Some = ws.next.await
// Close connection
ws.close.await?;
// Backpressure handling
if ws.is_backpressured
Split Streams
For concurrent read/write operations:
let = ws.split;
// SplitReader
reader.next.await // Receive message
// SplitWriter
writer.send.await?;
writer.send_text.await?;
writer.send_binary.await?;
writer.close.await?;
writer.is_closed.await;
// Reunite
let ws = reunite?;
Message Types
// Create messages
let text_msg = text; // From &str
let text_msg = text; // From String
let binary_msg = binary; // From Vec<u8>
// Access text content
if let Text = msg
Running Tests
Unit Tests
With Features
Autobahn Test Suite
# Build and run server + tests
# Or manually:
# Then run Autobahn client in another terminal
Fuzzing
sockudo-ws includes fuzz targets for security testing:
# Install cargo-fuzz
# Run fuzzing (requires nightly)
Fuzz Targets
| Target | Description |
|---|---|
parse_frame |
WebSocket frame parsing with arbitrary bytes |
unmask |
SIMD masking/unmasking operations |
utf8_validation |
UTF-8 validation consistency with std |
protocol |
Frame encoding/decoding round-trip |
Examples
Run the examples:
# Basic echo server
# Split streams (concurrent read/write)
# Axum integration
# HTTP/2 WebSocket server
# HTTP/3 WebSocket server
Architecture
sockudo-ws/
├── src/
│ ├── lib.rs # Public API, Config
│ ├── stream/ # WebSocket stream types
│ │ ├── mod.rs
│ │ ├── websocket.rs # WebSocketStream, Split types
│ │ └── transport_stream.rs
│ ├── protocol.rs # WebSocket protocol state machine
│ ├── frame.rs # Frame encoding/decoding
│ ├── handshake.rs # HTTP upgrade handshake
│ ├── simd.rs # SIMD masking (AVX/SSE/NEON/AltiVec/LSX)
│ ├── utf8.rs # SIMD UTF-8 validation
│ ├── cork.rs # Write batching buffer
│ ├── deflate.rs # permessage-deflate compression
│ ├── error.rs # Error types with categorization
│ ├── transport.rs # Transport trait (Http1, Http2, Http3)
│ ├── server.rs # WebSocketServer<T: Transport>
│ ├── client.rs # WebSocketClient<T: Transport>
│ ├── multiplex.rs # MultiplexedConnection
│ ├── extended_connect.rs # Shared Extended CONNECT logic
│ ├── http2/ # HTTP/2 WebSocket (RFC 8441)
│ │ ├── mod.rs
│ │ └── stream.rs # Http2Stream wrapper
│ ├── http3/ # HTTP/3 WebSocket (RFC 9220)
│ │ ├── mod.rs
│ │ └── stream.rs # Http3Stream wrapper
│ └── io_uring/ # Linux io_uring transport
│ ├── mod.rs
│ ├── stream.rs # UringStream wrapper
│ └── buffer.rs # Registered buffer pool
├── fuzz/ # Fuzzing targets
│ └── fuzz_targets/
│ ├── parse_frame.rs
│ ├── unmask.rs
│ ├── utf8_validation.rs
│ └── protocol.rs
├── examples/
│ ├── simple_echo.rs # Basic echo server
│ ├── split_echo.rs # Concurrent read/write
│ ├── axum_echo.rs # Axum integration
│ ├── http2_echo.rs # HTTP/2 WebSocket server
│ └── http3_echo.rs # HTTP/3 WebSocket server
├── autobahn/
│ ├── server.rs # Autobahn test server
│ └── Makefile # Build and test automation
└── benches/
└── throughput.rs # Criterion benchmarks
Performance Optimizations
- SIMD Masking: Uses AVX2/AVX-512/SSE2/NEON/AltiVec/LSX to XOR mask frames at 16-64 bytes per cycle
- SIMD UTF-8: Validates UTF-8 text at memory bandwidth speeds via simdutf8
- Zero-Copy: Parses frames directly from receive buffer without copying
- Cork Buffer: Batches small writes into 16KB chunks for fewer syscalls
- Vectored I/O: Uses
writev()to send multiple buffers in single syscall - io_uring: Kernel-level async I/O with submission queue batching
- Alignment-Aware SIMD: Handles unaligned prefix/suffix for optimal memory access
- Optional mimalloc: High-performance allocator for reduced allocation latency
License
MIT
Credits
sockudo-ws incorporates ideas and techniques from several excellent WebSocket libraries:
-
uWebSockets - The industry standard for high-performance WebSockets. Inspired the cork/batch writing strategy and overall performance-first design philosophy.
-
tokio-websockets - A well-designed Tokio-native WebSocket library. Borrowed several optimizations including:
- Masked frame fast path for small client frames
- Alignment-aware SIMD implementations
- Multi-architecture SIMD support (LoongArch64 LSX/LASX, PowerPC AltiVec, s390x z13, ARM NEON)
- Feature flag organization (TLS variants, SHA-1 options, RNG options)
- Fuzzing infrastructure
-
fastwebsockets - Deno's high-performance WebSocket library. Referenced for fuzzing patterns and frame parsing optimizations.
-
h2 - HTTP/2 implementation used for RFC 8441 support
-
quinn and h3 - QUIC and HTTP/3 implementations used for RFC 9220 support
-
tokio-uring - io_uring integration for Linux
-
simdutf8 - Battle-tested SIMD UTF-8 validation (used by simd-json, polars, arrow)