hattrie 0.9.1

A high-performance, cache-conscious, ordered map for string keys, based on the HAT-trie paper.
Documentation
# API Reference: hat-trie

## 1. Introduction & Core Concepts

The `hat-trie` library provides a high-performance, cache-conscious ordered map implementation optimized for string (or byte-sequence) keys.

### Core Types
*   **`HatTrie<V>`**: The primary entry point. It is a key-value collection that behaves similarly to `std::collections::BTreeMap` but uses a Hashed Array Tree Trie structure for improved cache locality and reduced memory usage with string keys.
*   **Keys**: All keys in the structure are treated as byte sequences (`Vec<u8>`). Methods accept any type that implements `AsRef<[u8]>` (e.g., `String`, `&str`, `&[u8]`).
*   **Values**: The structure is generic over the value type `V`.

### Architectural Principles
*   **Sorted Order**: Iteration and range queries always yield items in strict lexicographical (byte-wise) order.
*   **Bursting**: The structure uses a "burst threshold" to determine when to split leaf buckets into new trie nodes. This is the primary configuration knob for tuning performance vs. memory usage.

---

## 2. Configuration & Constants

Configuration is handled primarily through the constructor.

### Public Constants

```rust
pub const DEFAULT_BURST_THRESHOLD: usize = 4096;
```
*   **Type**: `usize`
*   **Value**: `4096`
*   **Description**: The default size limit (in bytes) for a bucket before it "bursts" into a new internal trie node. This fits within a standard 4KB OS page.

---

## 3. Main Types

### `HatTrie<V>`

The primary container struct.

#### Constructors

```rust
pub fn new() -> Self
```
Creates a new, empty `HatTrie` with the `DEFAULT_BURST_THRESHOLD`.

```rust
pub fn with_threshold(bytes: usize) -> Self
```
Creates a new, empty `HatTrie` with a custom burst threshold.
*   `bytes`: The size limit in bytes for leaf buckets. Smaller values (e.g., 256) reduce memory usage for sparse data; larger values (e.g., 16384) improve linear scan speeds and cache locality.

#### Basic Operations (CRUD)

```rust
pub fn insert<K: AsRef<[u8]>>(&mut self, key: K, value: V) -> Option<V>
```
Inserts a key-value pair into the trie.
*   `key`: The key to insert.
*   `value`: The value to associate with the key.
*   **Returns**: `Some(old_value)` if the key was already present, otherwise `None`.

```rust
pub fn get<K: AsRef<[u8]>>(&self, key: K) -> Option<&V>
```
Returns a reference to the value corresponding to the key.
*   `key`: The key to look up.
*   **Returns**: `Some(&value)` if found, `None` otherwise.

```rust
pub fn get_mut<K: AsRef<[u8]>>(&mut self, key: K) -> Option<&mut V>
```
Returns a mutable reference to the value corresponding to the key.
*   `key`: The key to look up.
*   **Returns**: `Some(&mut value)` if found, `None` otherwise.

```rust
pub fn remove<K: AsRef<[u8]>>(&mut self, key: K) -> Option<V>
```
Removes a key from the trie, returning the value if the key was previously in the map.
*   `key`: The key to remove.
*   **Returns**: `Some(value)` if the key existed, `None` otherwise.

#### Status & Attributes

```rust
pub fn len(&self) -> usize
```
Returns the number of elements in the trie.

```rust
pub fn is_empty(&self) -> usize
```
Returns `true` if the trie contains no elements.

#### Iteration

```rust
pub fn iter(&self) -> HatTrieIter<V>
```
Returns an iterator over the entries of the trie, sorted by key.
*   **Yields**: `(Vec<u8>, &V)`

```rust
pub fn streaming_iter(&self) -> HatTrieStreamingIter<V>
```
Returns a high-performance streaming iterator that avoids heap allocations.
*   **Description**: This is significantly faster for full-table scans than `iter()`. It reuses an internal buffer for key reconstruction. The returned `StreamItem`'s key slice is only valid for the duration of one loop iteration and should not be stored.
*   **Yields**: `StreamItem`, which provides `.key()` and `.value()` methods.

```rust
pub fn iter_segments(&self) -> SegmentIter<V>
```
Returns the fastest possible iterator, which is a zero-allocation "segment" iterator.
*   **Description**: Instead of reconstructing a full key, this iterator yields a `SegmentItem` which provides access to the key's constituent parts: the prefix path from the trie's directory and the suffix stored in a leaf node. The yielded item and its slices are only valid until the next call to `next()`.
*   **Yields**: `SegmentItem`, which provides `.segments()` and `.value()` methods.

#### Advanced Queries

```rust
pub fn range<'a, R, K>(&'a self, range: R) -> impl Iterator<Item = (Vec<u8>, &'a V)>
where
    R: RangeBounds<K>,
    K: AsRef<[u8]> + ?Sized
```
Returns an iterator over a sub-range of elements in the trie. This operation efficiently "seeks" to the start of the range.
*   `range`: A `RangeBounds` object (e.g., `start..end`, `start..`, `..end`).
*   **Yields**: `(Vec<u8>, &V)`

```rust
pub fn lower_bound<'a, K: AsRef<[u8]>>(&'a self, key: K) -> Option<(Vec<u8>, &'a V)>
```
Finds the first key-value pair where the key is greater than or equal to the provided key.
*   `key`: The probe key.
*   **Returns**: `Some((key, value))` of the lower bound, or `None` if all keys in the trie are smaller.

```rust
pub fn upper_bound<'a, K: AsRef<[u8]>>(&'a self, key: K) -> Option<(Vec<u8>, &'a V)>
```
Finds the first key-value pair where the key is strictly greater than the provided key.
*   `key`: The probe key.
*   **Returns**: `Some((key, value))` of the upper bound, or `None`.

```rust
pub fn scan_prefix<'a, K: AsRef<[u8]>>(&'a self, prefix: K) -> impl Iterator<Item = (Vec<u8>, &'a V)>
```
Returns an iterator over all entries whose keys start with the given prefix. This is more efficient than a range query as it traverses directly to the relevant subtree.
*   `prefix`: The byte sequence to match.
*   **Yields**: `(Vec<u8>, &V)`

```rust
pub fn fuzzy_search(&self, key: &[u8], max_dist: usize) -> Vec<(Vec<u8>, &V)>
```
Performs an approximate search using Levenshtein distance.
*   `key`: The target pattern to match against.
*   `max_dist`: The maximum allowed edits (insertions, deletions, substitutions).
*   **Returns**: A `Vec` containing all matching key-value pairs.

```rust
pub fn longest_common_prefix<K: AsRef<[u8]>>(&self, key: K) -> (usize, Option<&V>)
```
Walks the trie using the provided key and finds the longest prefix of that key that exists as a stored key in the trie.
*   `key`: The input key to match against the trie's contents.
*   **Returns**: A tuple `(length, Option<&value>)`. `length` is the length of the matching prefix in bytes.

#### Diagnostics

```rust
pub fn stats(&self) -> HatTrieStats
```
Returns statistics about the internal structure of the trie.

---

### `HatTrieStats`

A struct containing diagnostic information about the trie's memory layout.

#### Public Fields

*   `pub internal_nodes: usize`: The count of pointer-based trie nodes.
*   `pub leaf_nodes: usize`: The count of bucket nodes (flat arrays).
*   `pub total_items: usize`: The total number of items stored.
*   `pub max_bucket_size: usize`: The number of items in the most populated bucket.

---

## 4. Iterators

### `HatTrieIter<'a, V>`

The iterator produced by `HatTrie::iter()`. It yields borrowed values.

*   **Implements**: `Iterator<Item = (Vec<u8>, &'a V)>`

### `HatTrieIntoIter<V>`

The consuming iterator produced by `HatTrie::into_iter()`. It yields owned values.

*   **Implements**: `Iterator<Item = (Vec<u8>, V)>`

### `HatTrieStreamingIter<'a, V>`

A high-performance iterator produced by `HatTrie::streaming_iter()`. This iterator should be consumed with a `while let` loop.

```rust
// Usage:
let mut iter = trie.streaming_iter();
while let Some(item) = iter.next() {
    let key_slice = item.key();
    let value_ref = item.value();
    // ...
}
```
*   **`next(&mut self) -> Option<StreamItem<'_, 'a, V>>`**: Advances the iterator and returns a view to the next item.

### `StreamItem<'iter, 'a: 'iter, V>`

A temporary view of an item yielded by the `HatTrieStreamingIter`.
*   **`key(&self) -> &'iter [u8]`**: Returns the key for the current item as a byte slice.
*   **`value(&self) -> &'a V`**: Returns a reference to the value for the current item.

### `SegmentIter<'a, V>`

A zero-allocation iterator produced by `HatTrie::iter_segments()`. This iterator should be consumed with a `while let` loop.

```rust
// Usage:
let mut iter = trie.iter_segments();
while let Some(item) = iter.next() {
    let segments = item.segments();
    // segments.path and segments.suffix are &[u8]
    // ...
}
```
*   **`next(&mut self) -> Option<SegmentItem<'_, 'a, V>>`**: Advances the iterator and returns a view to the next item's segments.

### `SegmentItem<'iter, 'a: 'iter, V>`

A temporary view of a key's components and its value, yielded by `SegmentIter`.
*   **`segments(&self) -> KeySegments<'_>`**: Returns the key's components for the current item.
*   **`value(&self) -> &'a V`**: Returns a reference to the value for the current item.

### `KeySegments<'a>`

A view of a key's constituent parts.
*   **`pub path: &'a [u8]`**: A slice representing the prefix of the key from the trie's directory path.
*   **`pub suffix: &'a [u8]`**: A slice representing the suffix of the key from the leaf node.

---

## 5. Trait Implementations

The `HatTrie` struct implements the following standard Rust traits:

### `std::default::Default`
```rust
fn default() -> Self
```
Creates a `HatTrie` with the default burst threshold.

### `std::clone::Clone`
```rust
fn clone(&self) -> Self
```
Creates a deep copy of the trie. Available only if `V` implements `Clone`.

### `std::cmp::PartialEq` / `std::cmp::Eq`
```rust
fn eq(&self, other: &Self) -> bool
```
Checks for logical equality. Two tries are considered equal if they contain the exact same key-value pairs in the same order, regardless of internal structural differences (e.g., different burst thresholds).

### `std::iter::IntoIterator`
```rust
fn into_iter(self) -> HatTrieIntoIter<V>
```
Consumes the trie and returns an iterator over owned key-value pairs.

### `std::iter::FromIterator`
```rust
fn from_iter<T>(iter: T) -> Self
```
Creates a `HatTrie` from an iterator of `(String, V)` tuples. Uses default configuration.

### `std::iter::Extend`
```rust
fn extend<T>(&mut self, iter: T)
```
Inserts all items from an iterator of `(String, V)` tuples into the existing trie.

### `std::ops::Index`
```rust
fn index(&self, index: &str) -> &V
```
Allows accessing values using the bracket syntax `trie["key"]`. Panics if the key is not found.

---

## 6. Error Handling

The API primarily uses the `Option` type for error handling:
*   **Lookups**: `get`, `get_mut`, `lower_bound`, etc., return `None` if the key is not found.
*   **Insertion**: `insert` returns `None` if the key was new, or `Some(old_value)` if a value was overwritten.
*   **Removal**: `remove` returns `None` if the key did not exist.

There are no custom `Result` types or public Error enums defined in the library's runtime API. Panics may occur only if using the `Index` trait (`trie["missing"]`).