LogVec

A cache-friendly implementation of the Bentley-Saxe logarithmic data structure in Rust.

LogVec provides a dynamic, array-based collection tuned for primitive-key workloads and contiguous-memory access patterns. By maintaining elements in one Vec and leveraging binary-sized sorted runs, it often beats tree-based structures on insertion/search for small values, while exposing clear trade-offs for large inline payloads.

Features

Zero-Allocation Merges: Achieves amortized $O(\log N)$ insertion time. By exploiting Rust's slice::sort_unstable (pdqsort), internal restructuring requires zero heap allocations.
Fast Primitive Workloads: Strong insertion/search performance for u64 workloads.
Contiguous Memory: Elements live in a single Vec, virtually eliminating pointer-chasing and maximizing CPU cache hits.

🚀 Benchmarks

Hardware: local development machine, Criterion snapshot. Workload: 10,000 elements.

Baseline comparisons (`cargo bench --bench baselines_bench`)

Operation	`LogVec`	Baseline	Relative
`insert_u64_10k` (random order)	602 µs	`BTreeSet`: 699 µs	1.16x faster (`-13.8%`)
`insert_u64_10k` (random order)	602 µs	naive sorted `Vec`: 2420 µs	4.02x faster (`-75.1%`)
`search_u64_10k` (hits)	249 µs	`BTreeSet`: 365 µs	1.47x faster (`-31.8%`)
`insert_large_struct_10k` (random order)	3368 µs	`BTreeSet`: 1751 µs	0.52x (`+92.3%`, slower)

Recent optimization impact (relative to previous local baseline)

core/insert/u64_sequential_10k/logvec: about 0.85x time (-15.3%)
core/contains/u64_miss_seq_10k/logvec: about 0.81x time (-19.2%)
core/contains/u64_hit_seq_10k/logvec: effectively unchanged (noise-level)

All numbers are workload-, compiler-, and CPU-dependent; benchmark on your target machine before relying on absolute values.

How the Magic Works

The Bentley-Saxe method acts as an in-memory precursor to the LSM-trees used in massive databases (like RocksDB).

The underlying storage is just a single Vec<T>. However, conceptually, the array is partitioned into contiguous, exponentially sized sorted blocks ($2^{0, 2}1, 2^2 \dots$). The sizes of these blocks perfectly mirror the binary representation of the total number of elements, $N$.

Insertion: We simply push the new element to the end of the Vec. Then, we count the number of trailing zeros ($Z$) in the new length $N$. We take the last $2^Z$ elements and call sort_unstable() on them. Because pdqsort is highly optimized for partially sorted patterns, it glues the smaller blocks together into one large sorted block with minimal comparisons and zero allocations.
Search: To find an element, we binary search the blocks. We start with the largest block first, meaning we have a 50% chance of finding our element on the very first block search, achieving an extremely fast $O(\log N)$ fast-path.

Quick Start

Add logvec to your Cargo.toml:

cargo add logvec

Basic Usage

use logvec::LogVec;

fn main() {
    let mut arr = LogVec::new();

    // Insertions
    arr.insert(5u64);
    arr.insert(3u64);
    arr.insert(8u64);
    arr.insert(1u64);

    // Fast $O(\log^2 N)$ search
    assert!(arr.contains(&5u64));
    assert!(!arr.contains(&42u64));

}

🛠️ Roadmap & Future Optimizations

While already highly optimized, we are exploring the "dark arts" of CPU-level optimizations:

Min/Max Bounds Rejection: $O(1)$ pruning by checking block boundaries before binary search.
Branchless Binary Search: Rewriting the internal block search to eliminate CPU pipeline stalls caused by branch mispredictions.
Explicit SIMD (std::arch): Vectorized instructions (AVX2/AVX-512) to compare multiple elements in a single clock cycle.
Fractional Cascading: Implementing bridge pointers between blocks to theoretically drop search time to pure $O(\log N)$.

Contributing

Contributions, issues, and feature requests are welcome! If you want to help squeeze even more nanoseconds out of this data structure, feel free to open a PR.

logvec 1.1.1