logvec 1.0.1 - Docs.rs

# LogArray

**A blazingly fast, cache-friendly implementation of the Bentley-Saxe logarithmic data structure in Rust.**

`LogArray` provides a dynamic, array-based collection that achieves exceptional performance for write-heavy workloads. By maintaining elements in a single contiguous `Vec` and leveraging the mathematical properties of binary representations, `LogArray` drastically outperforms traditional tree-based structures like `BTreeSet` in both insertion speed and cache locality.

## Features

* **Zero-Allocation Merges:** Achieves amortized $O(\log N)$ insertion time. By exploiting Rust's `slice::sort_unstable` (pdqsort), internal restructuring requires zero heap allocations.
* **Insanely Fast Insertions:** Up to **8.3x faster** than standard `BTreeSet` for primitive types.
* **Contiguous Memory:** Elements live in a single `Vec`, virtually eliminating pointer-chasing and maximizing CPU cache hits.

## 🚀 Benchmarks

*Hardware: Standard modern CPU. Workload: 10,000 elements.*

| Operation (10k `u64`) | `LogArray` | `std::collections::BTreeSet` | Naive Sorted `Vec` |
|-----------------------|-------------|------------------------------|--------------------|
| **Sequential Insert** | **67 µs**   | 562 µs                       | 201 µs             |
| **Random Search**     | **182 µs**  | 331 µs                       | -                  |

Even for larger 256-byte structs, `LogArray` breaks the 1-millisecond barrier (~900 µs vs `BTreeSet`'s ~1.11 ms), proving that its cache mechanical sympathy beats pointer-based nodes even when moving data.

## How the Magic Works

The Bentley-Saxe method acts as an in-memory precursor to the LSM-trees used in massive databases (like RocksDB).

The underlying storage is just a single `Vec<T>`. However, conceptually, the array is partitioned into contiguous, exponentially sized sorted blocks ($2^0, 2^1, 2^2 \dots$). The sizes of these blocks perfectly mirror the binary representation of the total number of elements, $N$.

1. **Insertion:** We simply push the new element to the end of the `Vec`. Then, we count the number of trailing zeros ($Z$) in the new length $N$. We take the last $2^Z$ elements and call `sort_unstable()` on them. Because pdqsort is highly optimized for partially sorted patterns, it glues the smaller blocks together into one large sorted block with minimal comparisons and **zero** allocations.
2. **Search:** To find an element, we binary search the blocks. We start with the largest block first, meaning we have a 50% chance of finding our element on the very first block search, achieving an extremely fast $O(\log N)$ fast-path.

## Quick Start

Add `logvec` to your `Cargo.toml`:

```shell
cargo add logvec
```

### Basic Usage

```rust
use logvec::LogArray;

fn main() {
    let mut arr = LogArray::new();

    // Blazing fast insertions
    arr.insert(5u64);
    arr.insert(3u64);
    arr.insert(8u64);
    arr.insert(1u64);

    // Fast $O(\log^2 N)$ search
    assert!(arr.contains(&5u64));
    assert!(!arr.contains(&42u64));

    // (Optional) Iteration yields elements, though not globally sorted
    for item in arr.iter() {
        println!("{}", item);
    }
}

```

## 🛠️ Roadmap & Future Optimizations

While already highly optimized, we are exploring the "dark arts" of CPU-level optimizations:

* [ ] **Min/Max Bounds Rejection:** $O(1)$ pruning by checking the boundaries of a block before executing a binary search.
* [ ] **Branchless Binary Search:** Rewriting the internal block search to eliminate CPU pipeline stalls caused by branch mispredictions.
* [ ] **Explicit SIMD (`std::arch`):** Vectorized instructions (AVX2/AVX-512) to compare multiple elements in a single clock cycle.
* [ ] **Fractional Cascading:** Implementing bridge pointers between blocks to theoretically drop search time to pure $O(\log N)$.

## Contributing

Contributions, issues, and feature requests are welcome!
If you want to help squeeze even more nanoseconds out of this data structure, feel free to open a PR.