overclocked_sort 0.2.0

A hyper-optimized Parallel Counting Sort utilizing L2 Cache-oblivious block sizing, SIMD Auto-vectorization, Prefix-Sum, and Zero-Runtime Dynamic Work Stealing.
Documentation
  • Coverage
  • 35.71%
    5 out of 14 items documented0 out of 8 items with examples
  • Size
  • Source code size: 47.4 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 1.84 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 40s Average build duration of successful builds.
  • all releases: 29s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • Skyboy12

Overclocked Sort (v0.2.0)

A hyper-optimized, auto-adaptive, and zero-dependency Parallel Hybrid Counting/Radix Sort for Rust. Built explicitly for High-Performance Computing (HPC), Big Data (1B+ records), and L2 Cache efficiency.

It dynamically chooses between highly-concurrent Parallel Counting Sort (for Dense ranges) and std::slice::sort_unstable (for Sparse ranges) in real-time, completely preventing Cache Thrashing and Out-of-Memory (OOM) Panics.

🚀 Features

  • Generic API (New in v0.2.0): Supports u32, i32, i64, and custom objects (like KeyPtr) out of the box via the SortableKey trait.
  • Zero-Dependency Parallelism: Leverages pure std::thread::scope for Thread-pool-less parallel data processing. No rayon overhead.
  • Data-Aware Hybrid Partitioning: Scans data dynamically $O(N)$ and splits massive inputs (like $10^9$ elements uniformly distributed) into a Dense Stream (resolved by Overclocked Counting in milliseconds) and a Sparse Stream (resolved by In-place Quicksort).
  • Parallel Histogram & Scatter: Uses parallel prefix-sum arrays and concurrent Memory Scatter for extreme memory bandwidth saturation.
  • Cache-Oblivious: Avoids L3 cache misses intrinsically related to standard Radix Sort variants by allocating localized count buckets.

📦 Installation

Add this to your Cargo.toml:

[dependencies]

overclocked_sort = "0.2.0"

🛠 Usage

1. Sorting Primitives (In-place)

use overclocked_sort::overclocked_sort;

fn main() {
    let mut arr: Vec<i32> = vec![999, 1, 54, 2, 7000, 10];
    
    // Sort directly in-place!
    overclocked_sort(&mut arr);
    
    assert_eq!(arr, vec![1, 2, 10, 54, 999, 7000]);
}

2. Sorting Objects with Payloads (KeyPtr)

When you need to sort indices, database records, or pointer addresses (scatter-gathering):

use overclocked_sort::{overclocked_sort, KeyPtr};

fn main() {
    let mut db_records = vec![
        KeyPtr { key: 5, ptr: 0xFFFF_0001 },
        KeyPtr { key: 1, ptr: 0xFFFF_0002 },
    ];
    
    overclocked_sort(&mut db_records);
    
    assert_eq!(db_records[0].key, 1);
    assert_eq!(db_records[1].key, 5);
}

📊 Benchmarks vs std::sort_unstable

Test Case (Dataset) Overclocked Sort (v0.2.0) std::sort_unstable Diff
Dense Data (50M) 100.2 ms 510.1 ms ~5.1x Faster
Sparse Data (10M) 161.0 ms 144.3 ms Hybrid Fallback Safe
Already Sorted (50M) 55.2 ms 12.8 ms Fast $O(N)$ check
Structures (20M KeyPtrs) 328.3 ms 351.9 ms Memory Bandwidth Limit
Massive Mixed (100M) 5.5 s > 8.0 s In-place Partitioning

Test Spec: CPU Core i7/i9 equivalent, Rust 1.70+, Release mode

⚙️ Architecture: Dual-Stream Coordinator

If the system detects that allocating buckets would exceed ~OOM limits, it runs a Two-Pointer In-place Partition:

  1. Dense Stream: Data < Limits mapped to Left. Solved concurrently by Parallel Counting Sort.
  2. Sparse Stream: Data > Limits mapped to Right. Delegated safely to PDQSort / Unstable Sort.

📄 License

MIT License. See LICENSE for details.