logvec 1.1.1

A cache-friendly Bentley-Saxe logarithmic array data structure.
Documentation
# LogVec

**A cache-friendly implementation of the Bentley-Saxe logarithmic data structure in Rust.**

`LogVec` provides a dynamic, array-based collection tuned for primitive-key workloads and contiguous-memory access patterns. By maintaining elements in one `Vec` and leveraging binary-sized sorted runs, it often beats tree-based structures on insertion/search for small values, while exposing clear trade-offs for large inline payloads.

## Features

* **Zero-Allocation Merges:** Achieves amortized $O(\log N)$ insertion time. By exploiting Rust's `slice::sort_unstable` (pdqsort), internal restructuring requires zero heap allocations.
* **Fast Primitive Workloads:** Strong insertion/search performance for `u64` workloads.
* **Contiguous Memory:** Elements live in a single `Vec`, virtually eliminating pointer-chasing and maximizing CPU cache hits.

## 🚀 Benchmarks

*Hardware: local development machine, Criterion snapshot. Workload: 10,000 elements.*

### Baseline comparisons (`cargo bench --bench baselines_bench`)

| Operation | `LogVec` | Baseline | Relative |
|-----------|----------|----------|----------|
| `insert_u64_10k` (random order) | 602 µs | `BTreeSet`: 699 µs | **1.16x faster** (`-13.8%`) |
| `insert_u64_10k` (random order) | 602 µs | naive sorted `Vec`: 2420 µs | **4.02x faster** (`-75.1%`) |
| `search_u64_10k` (hits) | 249 µs | `BTreeSet`: 365 µs | **1.47x faster** (`-31.8%`) |
| `insert_large_struct_10k` (random order) | 3368 µs | `BTreeSet`: 1751 µs | **0.52x** (`+92.3%`, slower) |

### Recent optimization impact (relative to previous local baseline)

* `core/insert/u64_sequential_10k/logvec`: about **0.85x time** (`-15.3%`)
* `core/contains/u64_miss_seq_10k/logvec`: about **0.81x time** (`-19.2%`)
* `core/contains/u64_hit_seq_10k/logvec`: effectively unchanged (noise-level)

All numbers are workload-, compiler-, and CPU-dependent; benchmark on your target machine before relying on absolute values.

## How the Magic Works

The Bentley-Saxe method acts as an in-memory precursor to the LSM-trees used in massive databases (like RocksDB).

The underlying storage is just a single `Vec<T>`. However, conceptually, the array is partitioned into contiguous, exponentially sized sorted blocks ($2^0, 2^1, 2^2 \dots$). The sizes of these blocks perfectly mirror the binary representation of the total number of elements, $N$.

1. **Insertion:** We simply push the new element to the end of the `Vec`. Then, we count the number of trailing zeros ($Z$) in the new length $N$. We take the last $2^Z$ elements and call `sort_unstable()` on them. Because pdqsort is highly optimized for partially sorted patterns, it glues the smaller blocks together into one large sorted block with minimal comparisons and **zero** allocations.
2. **Search:** To find an element, we binary search the blocks. We start with the largest block first, meaning we have a 50% chance of finding our element on the very first block search, achieving an extremely fast $O(\log N)$ fast-path.

## Quick Start

Add `logvec` to your `Cargo.toml`:

```shell
cargo add logvec
```

### Basic Usage

```rust
use logvec::LogVec;

fn main() {
    let mut arr = LogVec::new();

    // Insertions
    arr.insert(5u64);
    arr.insert(3u64);
    arr.insert(8u64);
    arr.insert(1u64);

    // Fast $O(\log^2 N)$ search
    assert!(arr.contains(&5u64));
    assert!(!arr.contains(&42u64));

}

```

## 🛠️ Roadmap & Future Optimizations

While already highly optimized, we are exploring the "dark arts" of CPU-level optimizations:

* [x] **Min/Max Bounds Rejection:** $O(1)$ pruning by checking block boundaries before binary search.
* [ ] **Branchless Binary Search:** Rewriting the internal block search to eliminate CPU pipeline stalls caused by branch mispredictions.
* [ ] **Explicit SIMD (`std::arch`):** Vectorized instructions (AVX2/AVX-512) to compare multiple elements in a single clock cycle.
* [ ] **Fractional Cascading:** Implementing bridge pointers between blocks to theoretically drop search time to pure $O(\log N)$.

## Contributing

Contributions, issues, and feature requests are welcome!
If you want to help squeeze even more nanoseconds out of this data structure, feel free to open a PR.