overclocked_sort 0.2.0

A hyper-optimized Parallel Counting Sort utilizing L2 Cache-oblivious block sizing, SIMD Auto-vectorization, Prefix-Sum, and Zero-Runtime Dynamic Work Stealing.
Documentation
# Overclocked Sort (v0.2.0)


A hyper-optimized, auto-adaptive, and zero-dependency Parallel Hybrid Counting/Radix Sort for Rust. 
Built explicitly for **High-Performance Computing (HPC)**, **Big Data (1B+ records)**, and **L2 Cache efficiency**.

It dynamically chooses between highly-concurrent Parallel Counting Sort (for Dense ranges) and `std::slice::sort_unstable` (for Sparse ranges) in real-time, completely preventing Cache Thrashing and Out-of-Memory (OOM) Panics.

## 🚀 Features


- **Generic API (New in v0.2.0)**: Supports `u32`, `i32`, `i64`, and custom objects (like `KeyPtr`) out of the box via the `SortableKey` trait.
- **Zero-Dependency Parallelism**: Leverages pure `std::thread::scope` for Thread-pool-less parallel data processing. No `rayon` overhead.
- **Data-Aware Hybrid Partitioning**: Scans data dynamically $O(N)$ and splits massive inputs (like $10^9$ elements uniformly distributed) into a *Dense Stream* (resolved by Overclocked Counting in milliseconds) and a *Sparse Stream* (resolved by In-place Quicksort).
- **Parallel Histogram & Scatter**: Uses parallel prefix-sum arrays and concurrent Memory Scatter for extreme memory bandwidth saturation.
- **Cache-Oblivious**: Avoids L3 cache misses intrinsically related to standard Radix Sort variants by allocating localized count buckets.

## 📦 Installation


Add this to your `Cargo.toml`:

```toml
[dependencies]
overclocked_sort = "0.2.0"
```

## 🛠 Usage


### 1. Sorting Primitives (In-place)


```rust
use overclocked_sort::overclocked_sort;

fn main() {
    let mut arr: Vec<i32> = vec![999, 1, 54, 2, 7000, 10];
    
    // Sort directly in-place!
    overclocked_sort(&mut arr);
    
    assert_eq!(arr, vec![1, 2, 10, 54, 999, 7000]);
}
```

### 2. Sorting Objects with Payloads (`KeyPtr`)


When you need to sort indices, database records, or pointer addresses (scatter-gathering):

```rust
use overclocked_sort::{overclocked_sort, KeyPtr};

fn main() {
    let mut db_records = vec![
        KeyPtr { key: 5, ptr: 0xFFFF_0001 },
        KeyPtr { key: 1, ptr: 0xFFFF_0002 },
    ];
    
    overclocked_sort(&mut db_records);
    
    assert_eq!(db_records[0].key, 1);
    assert_eq!(db_records[1].key, 5);
}
```

## 📊 Benchmarks vs `std::sort_unstable`


| Test Case (Dataset) | Overclocked Sort (v0.2.0) | `std::sort_unstable` | Diff |
| :--- | :--- | :--- | :--- |
| **Dense Data (50M)** | **100.2 ms** | 510.1 ms | **~5.1x Faster** |
| **Sparse Data (10M)** | 161.0 ms | **144.3 ms** | Hybrid Fallback Safe |
| **Already Sorted (50M)** | 55.2 ms | **12.8 ms** | Fast $O(N)$ check |
| **Structures (20M KeyPtrs)** | **328.3 ms** | 351.9 ms | Memory Bandwidth Limit |
| **Massive Mixed (100M)** | **5.5 s** | > 8.0 s | In-place Partitioning |

*Test Spec: CPU Core i7/i9 equivalent, Rust 1.70+, Release mode*

## ⚙️ Architecture: Dual-Stream Coordinator


If the system detects that allocating buckets would exceed ~OOM limits, it runs a Two-Pointer In-place Partition:
1. `Dense Stream`: Data `< Limits` mapped to Left. Solved concurrently by **Parallel Counting Sort**.
2. `Sparse Stream`: Data `> Limits` mapped to Right. Delegated safely to **PDQSort / Unstable Sort**.

## 📄 License


MIT License. See `LICENSE` for details.