LogArray
A blazingly fast, cache-friendly implementation of the Bentley-Saxe logarithmic data structure in Rust.
LogArray provides a dynamic, array-based collection that achieves exceptional performance for write-heavy workloads. By maintaining elements in a single contiguous Vec and leveraging the mathematical properties of binary representations, LogArray drastically outperforms traditional tree-based structures like BTreeSet in both insertion speed and cache locality.
Features
- Zero-Allocation Merges: Achieves amortized $O(\log N)$ insertion time. By exploiting Rust's
slice::sort_unstable(pdqsort), internal restructuring requires zero heap allocations. - Insanely Fast Insertions: Up to 8.3x faster than standard
BTreeSetfor primitive types. - Contiguous Memory: Elements live in a single
Vec, virtually eliminating pointer-chasing and maximizing CPU cache hits.
🚀 Benchmarks
Hardware: Standard modern CPU. Workload: 10,000 elements.
Operation (10k u64) |
LogArray |
std::collections::BTreeSet |
Naive Sorted Vec |
|---|---|---|---|
| Sequential Insert | 67 µs | 562 µs | 201 µs |
| Random Search | 182 µs | 331 µs | - |
Even for larger 256-byte structs, LogArray breaks the 1-millisecond barrier (~900 µs vs BTreeSet's ~1.11 ms), proving that its cache mechanical sympathy beats pointer-based nodes even when moving data.
How the Magic Works
The Bentley-Saxe method acts as an in-memory precursor to the LSM-trees used in massive databases (like RocksDB).
The underlying storage is just a single Vec<T>. However, conceptually, the array is partitioned into contiguous, exponentially sized sorted blocks ($20, 21, 2^2 \dots$). The sizes of these blocks perfectly mirror the binary representation of the total number of elements, $N$.
- Insertion: We simply push the new element to the end of the
Vec. Then, we count the number of trailing zeros ($Z$) in the new length $N$. We take the last $2^Z$ elements and callsort_unstable()on them. Because pdqsort is highly optimized for partially sorted patterns, it glues the smaller blocks together into one large sorted block with minimal comparisons and zero allocations. - Search: To find an element, we binary search the blocks. We start with the largest block first, meaning we have a 50% chance of finding our element on the very first block search, achieving an extremely fast $O(\log N)$ fast-path.
Quick Start
Add logvec to your Cargo.toml:
cargo add logvec
Basic Usage
use LogArray;
🛠️ Roadmap & Future Optimizations
While already highly optimized, we are exploring the "dark arts" of CPU-level optimizations:
- Min/Max Bounds Rejection: $O(1)$ pruning by checking the boundaries of a block before executing a binary search.
- Branchless Binary Search: Rewriting the internal block search to eliminate CPU pipeline stalls caused by branch mispredictions.
- Explicit SIMD (
std::arch): Vectorized instructions (AVX2/AVX-512) to compare multiple elements in a single clock cycle. - Fractional Cascading: Implementing bridge pointers between blocks to theoretically drop search time to pure $O(\log N)$.
Contributing
Contributions, issues, and feature requests are welcome! If you want to help squeeze even more nanoseconds out of this data structure, feel free to open a PR.