nexus-net
Low-latency network protocol primitives. Sans-IO. Zero-copy where possible. Framework-agnostic — works with mio, io_uring, tokio, or raw syscalls.
Performance
vs tungstenite (in-memory parse, pinned to cores 0,2)
| Payload | Type | nexus-net | tungstenite | Speedup |
|---|---|---|---|---|
| 40B | binary parse | 19ns (52M/s) | 61ns (16M/s) | 3.2x |
| 128B | binary parse | 24ns (42M/s) | 75ns (13M/s) | 3.1x |
| 512B | binary parse | 49ns (20M/s) | 105ns (10M/s) | 2.1x |
| 77B | JSON quote parse+deser | 146ns (6.9M/s) | 205ns (4.9M/s) | 1.4x |
| 148B | JSON order parse+deser | 331ns (3.0M/s) | 382ns (2.6M/s) | 1.2x |
| 40B | binary TCP loopback | 30ns (33M/s) | 66ns (15M/s) | 2.2x |
| 77B | JSON TLS+parse+deser | 165ns (6.1M/s) | 221ns (4.5M/s) | 1.3x |
JSON deserialization uses sonic-rs. At the quote tick hot path (77B), WS framing is 18% of nexus-net's total vs 43% of tungstenite's.
rdtsc cycle distribution (pinned to core 0, batch=64)
| Path | p50 | p90 | p99 | p99.9 |
|---|---|---|---|---|
| text unmasked 128B | 39 | 39 | 43 | 65 |
| binary unmasked 128B | 35 | 36 | 44 | 129 |
| text masked 128B | 52 | 53 | 58 | 124 |
| apply_mask 128B | 12 | 12 | 16 | 31 |
| encode_text 128B server | 10 | 11 | 22 | 39 |
| throughput 100×128B /msg | 28 | 28 | 44 | 91 |
At 3GHz: 39 cycles = 13ns. In-memory throughput: 107M msg/sec (28 cycles/msg). TCP loopback throughput: 33M msg/sec (30ns/msg, 40B binary, pinned cores 0,2). The gap is kernel TCP overhead — protocol parsing is ~13ns of the 30ns round-trip.
TLS loopback (all three: async, blocking, tokio-tungstenite)
| Payload | nexus-async-net | nexus-net (blocking) | tokio-tungstenite |
|---|---|---|---|
| 40B | 32ns (31M/s) | 34ns (29M/s) | 112ns (9.0M/s) |
| 128B | 80ns (13M/s) | 78ns (13M/s) | 183ns (5.5M/s) |
3.5x faster than tokio-tungstenite over TLS. No meaningful async overhead — the async path matches blocking. See nexus-async-net for the tokio adapter.
517/517 Autobahn conformance tests passed (0 failed, 216 unimplemented compression — intentionally unsupported).
vs reqwest (REST HTTP/1.1 client)
| Benchmark | nexus-net | reqwest | Speedup |
|---|---|---|---|
| POST build+write+parse (mock) p50 | 494 cycles (165ns) | 1,549 cycles (516ns) build-only | 3.1x |
| POST build+write+parse (mock) p99 | 763 cycles (254ns) | 2,445 cycles (815ns) build-only | 3.2x |
| Protocol throughput (mock, single-threaded) | 5.3M req/sec | N/A | — |
| TCP loopback round-trip p50 | 22,924 cycles (7.6μs) | 62,802 cycles (20.9μs) | 2.7x |
| TCP loopback round-trip p99.9 | 59,860 cycles (20.0μs) | 198,120 cycles (66.0μs) | 3.3x |
| TCP loopback throughput | 114K req/sec | 39K req/sec | 2.9x |
All measurements pinned to physical P-cores (taskset -c 0 for mock, -c 0,2 for loopback). Workload: Binance-style order entry — POST + 4 headers + JSON body (~100B) + 200 OK JSON response. Zero per-request allocation. nexus-net measures full round-trip; reqwest measures build-only (no write/parse).
16/16 httpbin.org conformance tests passed (GET, POST, PUT, DELETE, PATCH, query params, custom headers, keep-alive, status codes, chunked transfer encoding).
Architecture
Application
|
├── WebSocket REST HTTP/1.1
| ^ Message<'a> ^ Request<'a> / RestResponse<'a>
| FrameReader / FrameWriter RequestWriter / ResponseReader (sans-IO)
| ^ plaintext bytes ^ plaintext bytes
| └──────────┬────────────────┘
| TlsCodec (optional, feature-gated)
| ^ encrypted bytes
└──────────────┘
I/O (your choice)
Each layer is a pure state machine. No syscalls, no sockets, no async.
Bytes in, messages out. The I/O layer is yours — mio, io_uring, tokio,
raw libc::read, kernel bypass.
Quick Start
[]
# WebSocket + HTTP, no TLS
= "0.2"
# With TLS (rustls + aws-lc-rs)
= { = "0.2", = ["tls"] }
# Everything (TLS + socket options + bytes)
= { = "0.2", = ["full"] }
WebSocket Client (ws://)
use ;
let mut ws = connect?;
ws.send_text?;
loop
WebSocket Client (wss://)
use WsStream;
// TLS detected from wss:// scheme — automatic with system root certs
let mut ws = connect?;
// Same API — recv(), send_text(), send_binary(), etc.
Or with custom TLS config:
use WsStream;
use TlsConfig;
let tls = builder.tls13_only.build?;
let mut ws = builder
.tls
.disable_nagle
.connect?;
REST Client (HTTP/1.1, blocking)
use ;
use ResponseReader;
// Protocol (sans-IO) — configured once at startup
let mut writer = new?;
writer.default_header?;
// Response reader — caller-owned, reused across requests
let mut reader = new.max_body_size;
// Transport — just a socket (TLS auto-detected from URL scheme)
let mut conn = connect?;
// GET with query parameters
let req = writer.get
.query
.query
.finish?;
let resp = conn.send?;
println!;
println!;
drop; // release reader borrow before next request
// POST with JSON body
let json = br#"{"symbol":"BTC-USD","side":"buy"}"#;
let req = writer.post
.header
.body
.finish?;
let resp = conn.send?;
println!;
// POST with body_writer (serialize directly into wire buffer)
let req = writer.post
.header
.body_writer
.finish?;
let resp = conn.send?;
// POST with body_fixed (known-size binary, zero-copy write)
let req = writer.post
.body_fixed
.finish?;
let resp = conn.send?;
Three objects, clear ownership:
writer— protocol encoder (sans-IO). Build request, get wire bytes.reader— protocol decoder (sans-IO). Feed bytes, parse response.conn— transport. Send bytes, receive bytes. No protocol knowledge.
The same writer and reader work with both sync and async transports.
Sans-IO (decoupled from sockets)
use ;
let = pair;
// You own the I/O — feed bytes however you want
reader.read_from?;
// Drain messages with poll limit
for _ in 0..8
Send Path (borrow, don't own)
let order = serialize_order; // you own this
ws.send_text?; // we borrow
archive.write?; // still yours — archive after send
Modules
buf — Buffer Primitives
ReadBuf— flat byte slab for inbound parsing. Pre/post padding. Pointer advancement, auto-reset when empty.WriteBuf— headroom buffer for outbound framing. Payload appended, protocol headers prepended. One contiguous slice for the syscall.
ws — WebSocket (RFC 6455)
FrameReader— sans-IO inbound parser. Handles frame parsing, fragment assembly, control frame interleaving, SIMD masking, UTF-8 validation. ReturnsMessage<'a>(zero-copy borrowed) orOwnedMessage.FrameWriter— sans-IO outbound encoder. Encodes into&mut [u8]orWriteBuf.WsStream<S>— convenience I/O wrapper over anyRead + Write. HTTP upgrade handshake built in.WsStream<S>with TLS —wss://URLs enable TLS transparently. Requirestlsfeature.Message<'a>—Text(&str),Binary(&[u8]),Ping(&[u8]),Pong(&[u8]),Close(CloseFrame). Text is validated UTF-8. Close codes are parsed intoCloseCodeenum.
rest — HTTP/1.1 REST Client
RequestWriter— sans-IO request encoder with typestate builder. ProducesRequest<'a>(zero-copy borrow of wire bytes). Supports query params (percent-encoded), per-request headers,body()(slice),body_writer()(serialize directly viastd::io::Write),body_fixed()(known-size direct write), base path.HttpConnection<S>— pure transport. 3 fields: stream, TLS, poisoned.send(req, &mut reader)is the whole API (request moved on send).RestResponse<'a>— borrows fromResponseReader. Status, headers, body. Supports Content-Length and chunked transfer encoding.
http — HTTP/1.1 Primitives
RequestReader/ResponseReader— sans-IO HTTP parsers backed byhttparse(SIMD-accelerated). Zero-copy header access. Cached Content-Length and Transfer-Encoding from parse.ChunkedDecoder— sans-IO chunked transfer encoding decoder.write_request/write_response— zero-alloc HTTP construction.
tls — TLS (feature: tls)
TlsConfig— shared config (Arc<ClientConfig>). System root certs, custom certs,danger_no_verify(), TLS 1.3 only.TlsCodec— sans-IO decrypt/encrypt wrapping rustls.process_into(&mut FrameReader)feeds decrypted plaintext directly into the WS parser.
Features
| Feature | Default | Description |
|---|---|---|
tls |
No | TLS support via rustls + aws-lc-rs |
socket-opts |
No | Socket options (SO_RCVBUF, SO_SNDBUF) via socket2 |
bytes |
No | bytes::Bytes conversion on OwnedMessage and RestResponse |
full |
No | All features enabled |
Without features: zero TLS compile time. The ws and http
modules work standalone.
Design Decisions
Zero-copy inbound. Message::Text(&str) borrows from the reader's
internal buffer. No heap allocation per message. Drop the message,
call recv() again.
Borrow, don't own. Send APIs take &str / &[u8]. You keep
ownership for archival after send. Works across .await points.
Sans-IO. Protocol logic is a pure state machine. The same
FrameReader works with blocking sockets, mio, io_uring, tokio, or
kernel bypass. No runtime coupling.
SIMD-accelerated. XOR masking uses SSE2/AVX2. UTF-8 validation
uses simdutf8. HTTP header parsing uses httparse (SIMD vectorized).
No permessage-deflate. WebSocket compression adds latency and no crypto exchange uses it. Exchanges that compress use application-level gzip (e.g., OKX sends gzipped binary frames).
Layered, not coupled. ReadBuf → FrameReader → WsStream are
independent layers. Use any combination. TlsCodec slots between
socket and FrameReader without changing either.
Testing
# WebSocket: Autobahn conformance (requires Podman)
# REST: httpbin.org conformance (requires network)
# wss:// echo test (requires network)
# Fuzzing (requires nightly)
# Benchmarks