Sliding Ring
sliding-ring provides a cache-friendly, allocation-free ring buffer that keeps its slots indexed relative to a moving anchor coordinate. It was extracted from our high-performance orderbook engine and generalized so any Sized slot type can participate in consistent sliding-window semantics.
Installation
# Enable the branchless clearing routines if you need predictably low latency clears.
The crate ships with a README.md-driven cargo doc experience, so cargo doc --open -p sliding-ring gives you rendered API docs locally.
Why this exists
Traditional ring buffers optimize for FIFO semantics, but exchange and telemetry workloads often care about the best coordinate (highest bid, current time bucket, etc.) and need random access around that best. SlidingRing keeps a 256-slot ring that re-anchors in O(1) time, clearing the newly entered slots while preserving the rest. This enables predictable performance, avoids per-slot allocations, and keeps the CPU cache warm even under heavy churn.
When to reach for it
- Aggregated orderbooks keyed by discrete price ticks.
- Sliding time buckets for counters, rate-limiters, or observability windows.
- Any moving reference window where only ±255 offsets from the anchor matter.
When it is the wrong tool
- You need more than 256 simultaneous offsets (use a wider structure or a map).
- The anchor jumps sparsely across a huge domain, making the clearing work wasteful.
- Slots own large heap allocations that would be expensive to recreate on every clear (store handles or IDs instead).
Model
anchordenotes the absolute coordinate represented by the pointer slot.- Moving the anchor by
ksteps rewires indices via(pointer + k) mod 256. - Small moves clear only the slots that just became visible; moves with
|k| >= 256clear the entire ring. shift_to_keep_new_basepreserves the new base slot when scanning for the next best coordinate.
Performance guardrails
- No dynamic dispatch:
SlidingRingis generic overT: Slot, so every slot type is monomorphized at compile time and the compiler inlines yourSlotmethods. Do not boxdyn Slot; the struct stores[T; 256], meaningTmust staySized. - Anchor slices for SIMD:
slices_from_anchor*return the two contiguous spans starting at the pointer so you can run vectorized scans or clears without modulo math. This is the primary primitive for ultra-low-latency users. - Stable SIMD workflow: Pair
slices_from_anchor*with thewidecrate (see thesimd_scanexample below) to stay on stable Rust while still using vector instructions. - Raw-pointer iterators:
iter()/iter_mut()walk storage order, whileiter_from_anchor*emit(price, slot)in buy/sell order. Both rely on pointer arithmetic, so they stay zero-cost abstractions. - Inline hot paths: ring methods and bundled
Slotimplementations are annotated with#[inline(always)]so LLVM can flatten the tiny helpers. - Memory over CPU: we willingly dedicate 256 slots per ring to keep operations branch-predictable. If memory becomes tighter than latency, this crate is not the right fit.
Ultra-low-latency workflow
- Call
slices_from_anchor*()to obtain the tail/head spans. - Run SIMD loads/stores or hand-unrolled loops directly over those spans (tail first, then head) to scan for non-zero slots, update counters, etc.
- If you also need absolute coordinates, pair the SIMD scan with
iter_from_anchor*()—they share the same traversal order so you can map SIMD findings back to prices with minimal bookkeeping.
Slot trait
Every slot type must implement the lightweight Slot trait, which defines how to create and clear a value. The crate ships ready-made implementations for:
- All integer primitives (
i*,u*, andusize/isize). Option<T>(clears back toNone).Cell<T>whereT: Slot + Copyso interior-mutable slots can be hosted.
Custom data can implement Slot to retain domain-specific state or perform branchless clears (e.g., resetting SIMD lanes, bitsets, etc.).
Example
use ;
let mut ring = new;
ring.slot_mut.hits += 1;
ring.shift_to;
assert_eq!;
Telemetry counter
use ;
let mut ring = new;
ring.slot_mut.hits += 1;
ring.slot_mut.hits += 3;
let sum: u64 = ring
.iter_from_anchor
.map
.sum;
assert_eq!;
SIMD scanning
use SlidingRing;
use u64x4;
let mut ring = new;
for i in 0..256
let = ring.slices_from_anchor;
let needle = splat;
for chunk in tail.chunks_exact.chain
Run cargo run --example telemetry or cargo run --example simd_scan for the full samples. The SIMD example relies on the wide crate, so it stays on stable Rust while still demonstrating contiguous-slice scanning.
Documentation
- API reference on docs.rs.
- Run
cargo doc --open -p sliding-ringfor local browsing.
Feature flags
branchless– enables branchless variants of the re-anchoring routines. They cost a few extra instructions per slot but avoid unpredictable branches when clearing. Enable with--features branchlesswhen compiling/testing.
Status
This crate is extracted from production systems but is still evolving. Expect additions such as configurable ring widths, iterators, and more slot utilities as we work toward a polished open-source release.
Examples
cargo run --example telemetry– basic per-slot counter with anchor-ordered reduction.cargo run --example simd_scan– demonstrates splitting the ring into contiguous spans and scanning with SIMD on stable via thewidecrate.
Benchmarks
cargo bench -p sliding-ringruns the Criterion suite measuring anchor shifts and slot updates.- Bazel users can run
bazel run //lib/collections/sliding-ring:sliding_ring_bench(taggedbenchmark) to launch the same binary.
License
Licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in sliding-ring by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.