sliding-ring 0.1.0

Cache-friendly sliding ring buffer keyed to an anchor coordinate for ultra-low-latency workloads.
Documentation

Sliding Ring

Crates.io lib.rs Documentation License Downloads

sliding-ring provides a cache-friendly, allocation-free ring buffer that keeps its slots indexed relative to a moving anchor coordinate. It was extracted from our high-performance orderbook engine and generalized so any Sized slot type can participate in consistent sliding-window semantics.

Installation

cargo add sliding-ring
# Enable the branchless clearing routines if you need predictably low latency clears.
cargo add sliding-ring --features branchless

The crate ships with a README.md-driven cargo doc experience, so cargo doc --open -p sliding-ring gives you rendered API docs locally.

Why this exists

Traditional ring buffers optimize for FIFO semantics, but exchange and telemetry workloads often care about the best coordinate (highest bid, current time bucket, etc.) and need random access around that best. SlidingRing keeps a 256-slot ring that re-anchors in O(1) time, clearing the newly entered slots while preserving the rest. This enables predictable performance, avoids per-slot allocations, and keeps the CPU cache warm even under heavy churn.

When to reach for it

  • Aggregated orderbooks keyed by discrete price ticks.
  • Sliding time buckets for counters, rate-limiters, or observability windows.
  • Any moving reference window where only ±255 offsets from the anchor matter.

When it is the wrong tool

  • You need more than 256 simultaneous offsets (use a wider structure or a map).
  • The anchor jumps sparsely across a huge domain, making the clearing work wasteful.
  • Slots own large heap allocations that would be expensive to recreate on every clear (store handles or IDs instead).

Model

  • anchor denotes the absolute coordinate represented by the pointer slot.
  • Moving the anchor by k steps rewires indices via (pointer + k) mod 256.
  • Small moves clear only the slots that just became visible; moves with |k| >= 256 clear the entire ring.
  • shift_to_keep_new_base preserves the new base slot when scanning for the next best coordinate.

Performance guardrails

  • No dynamic dispatch: SlidingRing is generic over T: Slot, so every slot type is monomorphized at compile time and the compiler inlines your Slot methods. Do not box dyn Slot; the struct stores [T; 256], meaning T must stay Sized.
  • Anchor slices for SIMD: slices_from_anchor* return the two contiguous spans starting at the pointer so you can run vectorized scans or clears without modulo math. This is the primary primitive for ultra-low-latency users.
  • Stable SIMD workflow: Pair slices_from_anchor* with the wide crate (see the simd_scan example below) to stay on stable Rust while still using vector instructions.
  • Raw-pointer iterators: iter() / iter_mut() walk storage order, while iter_from_anchor* emit (price, slot) in buy/sell order. Both rely on pointer arithmetic, so they stay zero-cost abstractions.
  • Inline hot paths: ring methods and bundled Slot implementations are annotated with #[inline(always)] so LLVM can flatten the tiny helpers.
  • Memory over CPU: we willingly dedicate 256 slots per ring to keep operations branch-predictable. If memory becomes tighter than latency, this crate is not the right fit.

Ultra-low-latency workflow

  1. Call slices_from_anchor*() to obtain the tail/head spans.
  2. Run SIMD loads/stores or hand-unrolled loops directly over those spans (tail first, then head) to scan for non-zero slots, update counters, etc.
  3. If you also need absolute coordinates, pair the SIMD scan with iter_from_anchor*()—they share the same traversal order so you can map SIMD findings back to prices with minimal bookkeeping.

Slot trait

Every slot type must implement the lightweight Slot trait, which defines how to create and clear a value. The crate ships ready-made implementations for:

  • All integer primitives (i*, u*, and usize/isize).
  • Option<T> (clears back to None).
  • Cell<T> where T: Slot + Copy so interior-mutable slots can be hosted.

Custom data can implement Slot to retain domain-specific state or perform branchless clears (e.g., resetting SIMD lanes, bitsets, etc.).

Example

use sliding_ring::{SlidingRing, Slot};

#[derive(Default)]
struct Bucket {
    hits: u64,
}

impl Slot for Bucket {
    fn init() -> Self {
        Self { hits: 0 }
    }

    fn clear(&mut self) {
        self.hits = 0;
    }
}

let mut ring = SlidingRing::<Bucket>::new(10_000);
ring.slot_mut(10_005).hits += 1;
ring.shift_to(10_005);
assert_eq!(ring.slot_at_pointer().hits, 1);

Telemetry counter

use sliding_ring::{Direction, SlidingRing, Slot};

#[derive(Default, Clone, Copy)]
struct Bucket { hits: u64 }

impl Slot for Bucket {
    fn init() -> Self { Self { hits: 0 } }
    fn clear(&mut self) { self.hits = 0; }
}

let mut ring = SlidingRing::<Bucket>::new(0);
ring.slot_mut(5).hits += 1;
ring.slot_mut(6).hits += 3;
let sum: u64 = ring
    .iter_from_anchor(Direction::Forward)
    .map(|(_, bucket)| bucket.hits)
    .sum();
assert_eq!(sum, 4);

SIMD scanning

use sliding_ring::SlidingRing;
use wide::u64x4;

let mut ring = SlidingRing::<u64>::new(1_000);
for i in 0..256 {
    *ring.slot_mut(1_000 + i as i128) = i as u64;
}
let (tail, head) = ring.slices_from_anchor();
let needle = u64x4::splat(42);

for chunk in tail.chunks_exact(4).chain(head.chunks_exact(4)) {
    let vec = u64x4::new([chunk[0], chunk[1], chunk[2], chunk[3]]);
    if vec
        .simd_eq(needle)
        .to_array()
        .iter()
        .any(|&lane| lane == u64::MAX)
    {
        println!("found 42");
        break;
    }
}

Run cargo run --example telemetry or cargo run --example simd_scan for the full samples. The SIMD example relies on the wide crate, so it stays on stable Rust while still demonstrating contiguous-slice scanning.

Documentation

  • API reference on docs.rs.
  • Run cargo doc --open -p sliding-ring for local browsing.

Feature flags

  • branchless – enables branchless variants of the re-anchoring routines. They cost a few extra instructions per slot but avoid unpredictable branches when clearing. Enable with --features branchless when compiling/testing.

Status

This crate is extracted from production systems but is still evolving. Expect additions such as configurable ring widths, iterators, and more slot utilities as we work toward a polished open-source release.

Examples

  • cargo run --example telemetry – basic per-slot counter with anchor-ordered reduction.
  • cargo run --example simd_scan – demonstrates splitting the ring into contiguous spans and scanning with SIMD on stable via the wide crate.

Benchmarks

  • cargo bench -p sliding-ring runs the Criterion suite measuring anchor shifts and slot updates.
  • Bazel users can run bazel run //lib/collections/sliding-ring:sliding_ring_bench (tagged benchmark) to launch the same binary.

License

Licensed under either of

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in sliding-ring by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.