crossbar
Zero-copy pub/sub over shared memory. URI-addressed. O(1) transfer at any payload size.
Transfers an 8-byte descriptor through a lock-free ring — O(1) regardless of payload. Subscribers read directly from shared memory. No copy, no serialization, no service discovery layer.
When to use crossbar
- High-frequency small messages: market data ticks, sensor readings, telemetry, game state
- Rust-native multi-process pipelines where latency compounds (10 000+ msg/s)
- Topics that need to be discovered at runtime by URI, not wired at compile time
- You want one crate with no heavy dependencies
When not to use crossbar
- Payload > 64 KB and you're copying into the block — both frameworks are memcpy-bound at that point and latency is equal
Installation
[]
= "0.4"
Quick start
Byte-oriented
Publisher — write any bytes into shared memory:
use *;
let mut pub_ = create?;
let topic = pub_.register?;
let mut loan = pub_.loan.unwrap;
loan.set_data.unwrap;
loan.publish; // O(1) — writes 8 bytes to ring
Subscriber — read in-place, zero copies, no unsafe:
use *;
let sub = connect?;
let stream = sub.subscribe?;
if let Some = stream.try_recv // guard drops → block freed back to pool
Typed
Any Copy + 'static struct where every bit pattern is valid can implement Pod for direct zero-copy reads:
use *;
unsafe
// Publisher
let mut pub_ = create?;
let topic = pub_.?;
let mut loan = pub_..unwrap;
*loan.as_mut = Tick ;
loan.publish;
// Subscriber
let sub = connect?;
let stream = sub.subscribe?;
if let Some = stream.
Blocking receive
// Default: three-phase spin → yield → futex/WFE
let guard = stream.recv?;
// Or pick a strategy
let guard = stream.recv_with?;
Multi-publisher
Multiple publishers can share the same region — one creates, others join:
use *;
// Process A — creates the region
let mut pub_a = create?;
let topic_a = pub_a.register?;
// Process B — joins the existing region
let mut pub_b = open?;
let topic_b = pub_b.register?;
// Or publish to the same topic as pub_a:
let topic_b2 = pub_b.register?;
Sequence numbers are claimed atomically via fetch_add. CAS-based ring slot locking prevents corruption when two publishers write to the same slot. Subscribers scan the ring window to handle out-of-order commits.
Bidirectional channel
Channel wraps two pub/sub regions into a TCP-like pair — one side listens, the other connects:
use *;
use Duration;
// Process A (server)
let mut srv = listen?;
// Process B (client)
let mut cli = connect?;
cli.send?;
let msg = srv.recv?;
// ... process and respond
srv.send?;
let reply = cli.recv?;
Born-in-SHM (zero-copy publish)
Write directly into the pool block — no intermediate buffer, no copy at any payload size:
let mut loan = pub_.loan.unwrap;
let buf = loan.as_mut_slice;
// write directly into shared memory
encode_frame;
loan.set_len.unwrap;
loan.publish;
Performance (v0.6.0)
All measurements: Criterion, same-process publisher + subscriber, try_recv (no futex).
Same-process benchmarks; cross-process latency is typically 2-5x higher.
Intel i7-10700KF · Linux 6.8 · rustc 1.87
| crossbar | iceoryx2 | speedup | |
|---|---|---|---|
| 8 B (transport overhead) | 55 ns | 230 ns | 4.2× |
| 1 KB | 67 ns | 239 ns | 3.6× |
| 64 KB | 1.47 µs | 1.32 µs | 0.9× |
| 1 MB | 30.7 µs | 29.8 µs | ~1× |
Apple M1 Pro · macOS · rustc 1.92 (v0.3.0, pre-optimization)
| crossbar | iceoryx2 | speedup | |
|---|---|---|---|
| 8 B (transport overhead) | 52 ns | 189 ns | 3.6× |
| 1 KB | 77 ns | 210 ns | 2.7× |
| 64 KB | 1.27 µs | 1.35 µs | 1.1× |
| 1 MB | 23.9 µs | 23.5 µs | ~1× |
The win is in the overhead. At small payloads crossbar's lighter path (no service discovery, no POSIX config layer) is 4.2x faster. At 64 KB+ both frameworks are memcpy-bound and converge — iceoryx2 is slightly faster at large payloads. The 8-byte descriptor is always O(1) — payload latency scales with how long you take to write into the block.
Pinned mode (latest-value, same buffer every iteration)
| crossbar | iceoryx2 | speedup | |
|---|---|---|---|
| 8 B | 35 ns | 229 ns | 6.5× |
| 1 KB | 45 ns | 237 ns | 5.3× |
| 64 KB | 1.10 µs | 1.32 µs | 1.2× |
| 1 MB | 18.4 µs | 18.3 µs | ~1× |
Pinned mode (loan_pinned / try_recv_pinned) reuses the same block every iteration — no allocation, no refcount, no ring. Safe API with CAS-based reader/writer exclusion. Best for market data, sensors, telemetry, game state.
Reproduce: cargo bench -- head_to_head (requires iceoryx2 dev-dep, Unix only).
See BENCHMARKS.md for full numbers, PodBus results, and methodology caveats.
C / C++ FFI
Build the shared library and link against it from C or C++:
# produces target/release/libcrossbar.so (Linux), .dylib (macOS), .dll (Windows)
Include include/crossbar.h:
// Publisher
crossbar_publisher_t* pub = ;
crossbar_topic_t topic = ;
;
// Subscriber
crossbar_subscriber_t* sub = ;
crossbar_subscription_t* stream = ;
crossbar_sample_t* sample = ; // allocates
if
// Or zero-allocation hot path:
crossbar_sample_t sample;
if
// Bidirectional channel
crossbar_channel_t* ch = ;
;
crossbar_sample_t* reply = ;
Configuration
Config
The pool is a Treiber stack — lock-free allocation at any payload size. Blocks are refcounted; a subscriber holding a Sample keeps the block alive.
Project layout
src/
lib.rs Crate root (#![no_std], feature gates)
pod.rs Pod trait — marker for safe zero-copy SHM reads
error.rs Error
wait.rs WaitStrategy (BusySpin / YieldSpin / BackoffSpin / Adaptive)
ffi.rs C FFI bindings (behind "ffi" feature)
protocol/ no_std core — pure atomics, no OS calls
layout.rs SHM layout constants and offset helpers
config.rs Config
region.rs Region — Treiber stack, seqlock, refcount
platform/ std only — mmap, futex, file I/O
mmap.rs RawMmap (MADV_HUGEPAGE on Linux)
notify.rs futex (Linux) / WaitOnAddress (Windows) / WFE (aarch64)
shm.rs Publisher, Subscriber
subscription.rs Stream, Sample, TypedSample
loan.rs Loan, TypedLoan, Topic
channel.rs Channel — bidirectional channel
include/
crossbar.h C/C++ header for FFI consumers
tests/
pubsub.rs Integration tests
typed_pubsub.rs Typed pub/sub integration tests
channel.rs Bidirectional channel tests
multi_publisher.rs Multi-publisher tests
benches/pubsub.rs Criterion benchmarks (+ iceoryx2 head-to-head, Unix)
examples/
publisher.rs Cross-process latency benchmark — publisher side
subscriber.rs Cross-process latency benchmark — subscriber side
no_std
The protocol core (src/protocol/, src/pod.rs, src/wait.rs, src/error.rs) is no_std + alloc. The platform layer (mmap, futex, file I/O) requires std and is gated behind features = ["std"] (the default).
Requirement: target_has_atomic = "64" — the ABA-safe Treiber stack uses 64-bit CAS.
# no_std + alloc only (protocol core, no Publisher/Subscriber)
= { = "0.4", = false }
# std (default — includes everything)
= "0.4"
License
Apache-2.0.