# LogVec
**A cache-friendly implementation of the Bentley-Saxe logarithmic data structure in Rust.**
`LogVec` provides a dynamic, array-based collection tuned for primitive-key workloads and contiguous-memory access patterns. By maintaining elements in one `Vec` and leveraging binary-sized sorted runs, it often beats tree-based structures on insertion/search for small values, while exposing clear trade-offs for large inline payloads.
## Features
* **Zero-Allocation Merges:** Achieves amortized $O(\log N)$ insertion time. By exploiting Rust's `slice::sort_unstable` (pdqsort), internal restructuring requires zero heap allocations.
* **Fast Primitive Workloads:** Strong insertion/search performance for `u64` workloads.
* **Contiguous Memory:** Elements live in a single `Vec`, virtually eliminating pointer-chasing and maximizing CPU cache hits.
## 🚀 Benchmarks
*Hardware: local development machine, Criterion snapshot. Workload: 10,000 elements.*
### Baseline comparisons (`cargo bench --bench baselines_bench`)
| `insert_u64_10k` (random order) | 602 µs | `BTreeSet`: 699 µs | **1.16x faster** (`-13.8%`) |
| `insert_u64_10k` (random order) | 602 µs | naive sorted `Vec`: 2420 µs | **4.02x faster** (`-75.1%`) |
| `search_u64_10k` (hits) | 249 µs | `BTreeSet`: 365 µs | **1.47x faster** (`-31.8%`) |
| `insert_large_struct_10k` (random order) | 3368 µs | `BTreeSet`: 1751 µs | **0.52x** (`+92.3%`, slower) |
### Recent optimization impact (relative to previous local baseline)
* `core/insert/u64_sequential_10k/logvec`: about **0.85x time** (`-15.3%`)
* `core/contains/u64_miss_seq_10k/logvec`: about **0.81x time** (`-19.2%`)
* `core/contains/u64_hit_seq_10k/logvec`: effectively unchanged (noise-level)
All numbers are workload-, compiler-, and CPU-dependent; benchmark on your target machine before relying on absolute values.
## How the Magic Works
The Bentley-Saxe method acts as an in-memory precursor to the LSM-trees used in massive databases (like RocksDB).
The underlying storage is just a single `Vec<T>`. However, conceptually, the array is partitioned into contiguous, exponentially sized sorted blocks ($2^0, 2^1, 2^2 \dots$). The sizes of these blocks perfectly mirror the binary representation of the total number of elements, $N$.
1. **Insertion:** We simply push the new element to the end of the `Vec`. Then, we count the number of trailing zeros ($Z$) in the new length $N$. We take the last $2^Z$ elements and call `sort_unstable()` on them. Because pdqsort is highly optimized for partially sorted patterns, it glues the smaller blocks together into one large sorted block with minimal comparisons and **zero** allocations.
2. **Search:** To find an element, we binary search the blocks. We start with the largest block first, meaning we have a 50% chance of finding our element on the very first block search, achieving an extremely fast $O(\log N)$ fast-path.
## Quick Start
Add `logvec` to your `Cargo.toml`:
```shell
cargo add logvec
```
### Basic Usage
```rust
use logvec::LogVec;
fn main() {
let mut arr = LogVec::new();
// Insertions
arr.insert(5u64);
arr.insert(3u64);
arr.insert(8u64);
arr.insert(1u64);
// Fast $O(\log^2 N)$ search
assert!(arr.contains(&5u64));
assert!(!arr.contains(&42u64));
}
```
## 🛠️ Roadmap & Future Optimizations
While already highly optimized, we are exploring the "dark arts" of CPU-level optimizations:
* [x] **Min/Max Bounds Rejection:** $O(1)$ pruning by checking block boundaries before binary search.
* [ ] **Branchless Binary Search:** Rewriting the internal block search to eliminate CPU pipeline stalls caused by branch mispredictions.
* [ ] **Explicit SIMD (`std::arch`):** Vectorized instructions (AVX2/AVX-512) to compare multiple elements in a single clock cycle.
* [ ] **Fractional Cascading:** Implementing bridge pointers between blocks to theoretically drop search time to pure $O(\log N)$.
## Contributing
Contributions, issues, and feature requests are welcome!
If you want to help squeeze even more nanoseconds out of this data structure, feel free to open a PR.