Overclocked Sort (v0.2.1)
A hyper-optimized, auto-adaptive, and zero-dependency Parallel Hybrid Counting/Radix Sort for Rust. Built explicitly for High-Performance Computing (HPC), Big Data (1B+ records), and L2 Cache efficiency.
It dynamically chooses between parallel counting sort (for dense ranges) and comparison-based fallback paths (for sparse or pattern-heavy ranges).
🚀 Features
- Generic API: Supports
u32,i32, and custom objects (likeKeyPtr) via theSortableKeytrait. - Zero-Dependency Parallelism: Leverages pure
std::thread::scopefor Thread-pool-less parallel data processing. Norayonoverhead. - Pattern-Aware Routing: Uses lightweight probes for monotonic/near-monotonic inputs and routes them to the most suitable path.
- Parallel Histogram & Scatter: Uses parallel prefix-sum arrays and concurrent Memory Scatter for extreme memory bandwidth saturation.
- Edge-Case Guardrails: Handles already-sorted, reverse-sorted, almost-sorted, and all-equal inputs near parity with
std::sort_unstable.
📦 Installation
Add this to your Cargo.toml:
[]
= "0.2.1"
🛠 Usage
1. Sorting Primitives (In-place)
use overclocked_sort;
2. Sorting Objects with Payloads (KeyPtr)
When you need to sort indices, database records, or pointer addresses (scatter-gathering):
use ;
📊 Benchmarks vs std::sort_unstable
| Test Case (Dataset) | Overclocked Sort (v0.2.1) | std::sort_unstable |
Speed (std / overclocked) |
|---|---|---|---|
| random_dense (2M) | 24.242 ms | 40.806 ms | 1.683x |
| random_sparse (2M) | 34.193 ms | 53.466 ms | 1.564x |
| already_sorted (2M) | 0.908 ms | 0.866 ms | 0.954x |
| reverse_sorted (2M) | 1.458 ms | 1.414 ms | 0.970x |
| all_equal (2M) | 0.882 ms | 0.888 ms | 1.007x |
| sawtooth_1024 (2M) | 9.708 ms | 22.109 ms | 2.277x |
Detailed multi-case benchmark table is available in benchmark_results_matrix.md.
⚙️ Pseudocode (v0.2.1)
function overclocked_sort(arr):
if len(arr) <= 1:
return
# 1) Lightweight probe to quickly detect monotonic-like inputs.
probe_desc_breaks = sampled_desc_breaks(arr)
if probe_desc_breaks == 0 or probe_desc_breaks >= probe_points - 2:
std_sort_unstable(arr)
return
# 2) Single pass: monotonic flags + min/max + desc_breaks.
stats = scan_keys(arr)
if stats.non_decreasing:
return
if stats.non_increasing:
reverse(arr)
return
range = stats.max_key - stats.min_key + 1
max_limit = min(1_000_000, max(1, len(arr)/4))
# 3) Near-sorted sparse-ish patterns are delegated to std PDQ path.
if range > max_limit and is_near_monotonic(stats.desc_breaks, len(arr)):
std_sort_unstable(arr)
return
# 4) Dense range uses parallel counting, otherwise radix fallback.
if range <= max_limit:
if primitive_type(arr):
parallel_counting_fill(arr, stats.min_key, range)
else:
parallel_counting_scatter(arr, stats.min_key, range)
else:
radix_sort_by_key(arr)
📄 License
MIT License. See LICENSE for details.