logvec 1.0.0

A cache-friendly Bentley-Saxe logarithmic array data structure.
Documentation
  • Coverage
  • 12.5%
    1 out of 8 items documented0 out of 7 items with examples
  • Size
  • Source code size: 30.61 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 306.09 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 13s Average build duration of successful builds.
  • all releases: 31s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • khoda81

LogArray

A blazingly fast, cache-friendly implementation of the Bentley-Saxe logarithmic data structure in Rust.

LogArray provides a dynamic, array-based collection that achieves exceptional performance for write-heavy workloads. By maintaining elements in a single contiguous Vec and leveraging the mathematical properties of binary representations, LogArray drastically outperforms traditional tree-based structures like BTreeSet in both insertion speed and cache locality.

Features

  • Zero-Allocation Merges: Achieves amortized $O(\log N)$ insertion time. By exploiting Rust's slice::sort_unstable (pdqsort), internal restructuring requires zero heap allocations.
  • Insanely Fast Insertions: Up to 8.3x faster than standard BTreeSet for primitive types.
  • Contiguous Memory: Elements live in a single Vec, virtually eliminating pointer-chasing and maximizing CPU cache hits.

🚀 Benchmarks

Hardware: Standard modern CPU. Workload: 10,000 elements.

Operation (10k u64) LogArray std::collections::BTreeSet Naive Sorted Vec
Sequential Insert 67 µs 562 µs 201 µs
Random Search 182 µs 331 µs -

Even for larger 256-byte structs, LogArray breaks the 1-millisecond barrier (~900 µs vs BTreeSet's ~1.11 ms), proving that its cache mechanical sympathy beats pointer-based nodes even when moving data.

How the Magic Works

The Bentley-Saxe method acts as an in-memory precursor to the LSM-trees used in massive databases (like RocksDB).

The underlying storage is just a single Vec<T>. However, conceptually, the array is partitioned into contiguous, exponentially sized sorted blocks ($20, 21, 2^2 \dots$). The sizes of these blocks perfectly mirror the binary representation of the total number of elements, $N$.

  1. Insertion: We simply push the new element to the end of the Vec. Then, we count the number of trailing zeros ($Z$) in the new length $N$. We take the last $2^Z$ elements and call sort_unstable() on them. Because pdqsort is highly optimized for partially sorted patterns, it glues the smaller blocks together into one large sorted block with minimal comparisons and zero allocations.
  2. Search: To find an element, we binary search the blocks. We start with the largest block first, meaning we have a 50% chance of finding our element on the very first block search, achieving an extremely fast $O(\log N)$ fast-path.

Quick Start

Add logvec to your Cargo.toml:

cargo add logvec

Basic Usage

use logvec::LogArray;

fn main() {
    let mut arr = LogArray::new();

    // Blazing fast insertions
    arr.insert(5u64);
    arr.insert(3u64);
    arr.insert(8u64);
    arr.insert(1u64);

    // Fast $O(\log^2 N)$ search
    assert!(arr.contains(&5u64));
    assert!(!arr.contains(&42u64));

    // (Optional) Iteration yields elements, though not globally sorted
    for item in arr.iter() {
        println!("{}", item);
    }
}

🛠️ Roadmap & Future Optimizations

While already highly optimized, we are exploring the "dark arts" of CPU-level optimizations:

  • Min/Max Bounds Rejection: $O(1)$ pruning by checking the boundaries of a block before executing a binary search.
  • Branchless Binary Search: Rewriting the internal block search to eliminate CPU pipeline stalls caused by branch mispredictions.
  • Explicit SIMD (std::arch): Vectorized instructions (AVX2/AVX-512) to compare multiple elements in a single clock cycle.
  • Fractional Cascading: Implementing bridge pointers between blocks to theoretically drop search time to pure $O(\log N)$.

Contributing

Contributions, issues, and feature requests are welcome! If you want to help squeeze even more nanoseconds out of this data structure, feel free to open a PR.