crossbar
Zero-copy pub/sub over shared memory. URI-addressed. O(1) transfer at any payload size.
Transfers an 8-byte descriptor through a lock-free ring — O(1) regardless of payload. Subscribers read directly from shared memory. No copy, no serialization, no service discovery layer.
When to use crossbar
- High-frequency small messages: market data ticks, sensor readings, telemetry, game state
- Rust-native multi-process pipelines where latency compounds (10 000+ msg/s)
- Topics that need to be discovered at runtime by URI, not wired at compile time
- You want one crate with no heavy dependencies
When not to use crossbar
- Payload > 64 KB and you're copying into the block — both frameworks are memcpy-bound at that point and latency is equal
Installation
[]
= "0.4"
Quick start
Byte-oriented
Publisher — write any bytes into shared memory:
use *;
let mut pub_ = create?;
let topic = pub_.register?;
let mut loan = pub_.loan.unwrap;
loan.set_data.unwrap;
loan.publish; // O(1) — writes 8 bytes to ring
Subscriber — read in-place, zero copies, no unsafe:
use *;
let sub = connect?;
let stream = sub.subscribe?;
if let Some = stream.try_recv // guard drops → block freed back to pool
Typed
Any Copy + 'static struct where every bit pattern is valid can implement Pod for direct zero-copy reads:
use *;
unsafe
// Publisher
let mut pub_ = create?;
let topic = pub_.?;
let mut loan = pub_..unwrap;
*loan.as_mut = Tick ;
loan.publish;
// Subscriber
let sub = connect?;
let stream = sub.subscribe?;
if let Some = stream.
Blocking receive
// Default: three-phase spin → yield → futex/WFE
let guard = stream.recv?;
// Or pick a strategy
let guard = stream.recv_with?;
Multi-publisher
Multiple publishers can share the same region — one creates, others join:
use *;
// Process A — creates the region
let mut pub_a = create?;
let topic_a = pub_a.register?;
// Process B — joins the existing region
let mut pub_b = open?;
let topic_b = pub_b.register?;
// Or publish to the same topic as pub_a:
let topic_b2 = pub_b.register?;
Sequence numbers are claimed atomically via fetch_add. CAS-based ring slot locking prevents corruption when two publishers write to the same slot. Subscribers scan the ring window to handle out-of-order commits.
Bidirectional channel
ShmChannel wraps two pub/sub regions into a TCP-like pair — one side listens, the other connects:
use *;
use Duration;
// Process A (server)
let mut srv = listen?;
// Process B (client)
let mut cli = connect?;
cli.send?;
let msg = srv.recv?;
// ... process and respond
srv.send?;
let reply = cli.recv?;
Born-in-SHM (zero-copy publish)
Write directly into the pool block — no intermediate buffer, no copy at any payload size:
let mut loan = pub_.loan.unwrap;
let buf = loan.as_mut_slice;
// write directly into shared memory
encode_frame;
loan.set_len.unwrap;
loan.publish;
Performance
All measurements: Criterion, same-process publisher + subscriber, try_recv (no futex).
Intel i7-10700KF · Linux 6.8 · rustc 1.87
| crossbar | iceoryx2 | speedup | |
|---|---|---|---|
| 8 B (transport overhead) | 54 ns | 232 ns | 4.3× |
| 1 KB | 66 ns | 243 ns | 3.7× |
| 64 KB | 1.35 µs | 1.38 µs | ~1× |
| 1 MB | 31 µs | 30 µs | ~1× |
Apple M1 Pro · macOS · rustc 1.92 (v0.3.0, pre-optimization)
| crossbar | iceoryx2 | speedup | |
|---|---|---|---|
| 8 B (transport overhead) | 52 ns | 189 ns | 3.6× |
| 1 KB | 77 ns | 210 ns | 2.7× |
| 64 KB | 1.27 µs | 1.35 µs | 1.1× |
| 1 MB | 23.9 µs | 23.5 µs | ~1× |
The win is in the overhead. At small payloads crossbar's lighter path (no service discovery, no POSIX config layer) is 4× faster. At 64 KB+ both frameworks are memcpy-bound and converge. The 8-byte descriptor is always O(1) — payload latency scales with how long you take to write into the block.
Pinned mode (latest-value, same buffer every iteration)
| crossbar | iceoryx2 | speedup | |
|---|---|---|---|
| 8 B | 35 ns | 229 ns | 6.5× |
| 1 KB | 45 ns | 238 ns | 5.3× |
| 64 KB | 1.07 µs | 1.30 µs | 1.2× |
| 1 MB | 18.1 µs | 18.4 µs | ~1× |
Pinned mode (loan_pinned / try_recv_pinned) reuses the same block every iteration — no allocation, no refcount, no ring. Safe API with CAS-based reader/writer exclusion. Best for market data, sensors, telemetry, game state. The 8B cost increased from 26 ns to 35 ns in v0.3.1 due to the CAS sentinel added for safety (prevents data races between publisher writes and subscriber reads).
Reproduce: cargo bench -- head_to_head (requires iceoryx2 dev-dep, Unix only).
C / C++ FFI
Build the shared library and link against it from C or C++:
# produces target/release/libcrossbar.so (Linux), .dylib (macOS), .dll (Windows)
Include include/crossbar.h:
// Publisher
crossbar_publisher_t* pub = ;
crossbar_topic_t topic = ;
;
// Subscriber
crossbar_subscriber_t* sub = ;
crossbar_subscription_t* stream = ;
crossbar_sample_t* sample = ; // allocates
if
// Or zero-allocation hot path:
crossbar_sample_t sample;
if
// Bidirectional channel
crossbar_channel_t* ch = ;
;
crossbar_sample_t* reply = ;
Configuration
PubSubConfig
The pool is a Treiber stack — lock-free allocation at any payload size. Blocks are refcounted; a subscriber holding a SampleGuard keeps the block alive.
Project layout
src/
lib.rs Crate root (#![no_std], feature gates)
pod.rs Pod trait — marker for safe zero-copy SHM reads
error.rs IpcError
wait.rs WaitStrategy (BusySpin / YieldSpin / BackoffSpin / Adaptive)
ffi.rs C FFI bindings (behind "ffi" feature)
protocol/ no_std core — pure atomics, no OS calls
layout.rs SHM layout constants and offset helpers
config.rs PubSubConfig
region.rs Region — Treiber stack, seqlock, refcount
platform/ std only — mmap, futex, file I/O
mmap.rs RawMmap (MADV_HUGEPAGE on Linux)
notify.rs futex (Linux) / WaitOnAddress (Windows) / WFE (aarch64)
shm.rs ShmPublisher, ShmSubscriber
subscription.rs Subscription, SampleGuard, TypedSampleGuard
loan.rs ShmLoan, TypedShmLoan, TopicHandle
channel.rs ShmChannel — bidirectional channel
include/
crossbar.h C/C++ header for FFI consumers
tests/
pubsub.rs Integration tests
typed_pubsub.rs Typed pub/sub integration tests
channel.rs Bidirectional channel tests
multi_publisher.rs Multi-publisher tests
benches/pubsub.rs Criterion benchmarks (+ iceoryx2 head-to-head, Unix)
examples/
publisher.rs Cross-process latency benchmark — publisher side
subscriber.rs Cross-process latency benchmark — subscriber side
no_std
The protocol core (src/protocol/, src/pod.rs, src/wait.rs, src/error.rs) is no_std + alloc. The platform layer (mmap, futex, file I/O) requires std and is gated behind features = ["std"] (the default).
Requirement: target_has_atomic = "64" — the ABA-safe Treiber stack uses 64-bit CAS.
# no_std + alloc only (protocol core, no ShmPublisher/ShmSubscriber)
= { = "0.4", = false }
# std (default — includes everything)
= "0.4"
License
Apache-2.0.