turbosort
SIMD-accelerated radix sort for primitive types in pure Rust.
No FFI, no nightly, no unsafe trait implementations. Just fast sorting.
Performance
Sorting 10M random u32 values:
| Algorithm | Time | vs std |
|---|---|---|
std::sort_unstable |
301 ms | 1.0x |
turbosort::sort |
182 ms | 1.7x |
At 1M elements, turbosort is 3.1x faster than std. The gap widens with array size as O(n) radix sort dominates O(n log n) comparison sort.
Small arrays (n ≤ 16) use AVX2 sorting networks: 1.7x faster than std.
Usage
// Sort any primitive numeric type
let mut data = vec!;
sort;
assert_eq!;
// Signed integers
let mut v = vec!;
sort;
assert_eq!;
// Floats (total order: -inf < -0.0 < +0.0 < +inf < NaN)
let mut v = vec!;
sort;
assert_eq!;
assert!;
// Zero-allocation sorting with caller-provided buffer
let mut data = ;
let mut buf = ;
sort_with_buffer;
Supported Types
All 10 primitive numeric types: u8, u16, u32, u64, i8, i16, i32, i64, f32, f64.
Algorithm Dispatch
| Input Size | Algorithm | Complexity |
|---|---|---|
| 0–1 | no-op | — |
| 2–16 | SIMD sorting network (AVX2, NEON ≤8) | O(n) |
| 17–512 | Quicksort with SIMD leaf nodes | O(n log n) |
| 513+ | LSD radix sort | O(n) |
| 131K+ | Parallel radix sort (rayon) | O(n/p) |
Features
[]
= "0.1"
# For parallel sorting:
= { = "0.1", = ["parallel"] }
| Feature | Default | Description |
|---|---|---|
std |
yes | Heap allocation + CPUID detection |
alloc |
yes | Radix sort buffer allocation |
parallel |
no | sort_parallel() via rayon |
no_std is supported — disable default features and use sort_with_buffer().
Architecture Support
| Arch | SIMD | Status |
|---|---|---|
| x86_64 | AVX2 | Sorting networks, quicksort leaf acceleration |
| x86_64 | SSE4.2 | Planned |
| aarch64 | NEON | Sorting networks (4-lane) |
| Other | — | Scalar fallback (still uses radix sort) |
Runtime CPUID dispatch on x86_64 — same binary works on all CPUs.
License
Dual-licensed under MIT or Apache 2.0.