radix256Sort
日本語 (Japanese) | Technical Details
A high-performance, stable Radix Sort implementation for u32 integers, written in Rust with Python bindings.
Optimized for CPU cache efficiency and zero-allocation (internal loop) strategy.
Features
- Fast: Outperforms standard library sorts (
std::slice::sort) and Python'slist.sort/numpy.sortfor large datasets. - Stable: Preserves the relative order of equal elements.
- Safe: Pure Rust implementation without
unsafeblocks. - Simple: Fixed 256-base, 4-pass algorithm optimized for 32-bit integers.
Getting Started
Prerequisites
- Rust (latest stable)
- Python 3.7+ (for Python bindings)
1. Clone the Repository
2. Rust: Run Tests & Benchmarks
Run unit tests:
Run micro-benchmarks (Criterion):
Results will be generated at target/criterion/report/index.html.
Run macro-benchmarks (100M items):
3. Python: Build & Run Benchmarks
It is recommended to use a virtual environment.
# Create and activate virtual environment
# Install build tools
# Build and install the library
# Run benchmarks
Usage
Rust
use radix256_sort_vec;
let mut data = vec!;
let sorted = radix256_sort_vec;
assert_eq!;
Or in-place:
use radix256_sort_inplace;
let mut data = vec!;
radix256_sort_inplace;
assert_eq!;
Python
=
=
# [1, 2, 5, 5, 9]
Benchmarks
Performance measured on 100,000,000 (100M) random u32 integers.
[!NOTE] The following figures are reference values from a development environment. Performance may vary depending on the system.
Legend
radix256_sort_vec: This library (Buffer version) - Fastestradix256_sort_inplace: This library (In-place version)std_sort: Rust standard stable sort (Comparison target)std_sort_unstable: Rust standard unstable sort (Reference)
Rust
| Algorithm | Time (s) | Speedup |
|---|---|---|
std::slice::sort |
2.99s | 1.0x |
radix256_sort_vec |
0.84s | 3.56x |
Python
| Algorithm | Time (s) | Speedup (vs list) |
|---|---|---|
list.sort() |
76.89s | 1.0x |
radix256_sort |
7.61s | 10.1x |
numpy.sort() |
5.27s | 14.6x |
Analysis
The benchmark results demonstrate that radix256_sort significantly outperforms standard library implementations in both Rust and Python for large datasets.
- Rust: The 3.5x speedup over the highly optimized
std::slice::sort(pdqsort) confirms the efficiency of the cache-friendly, fixed-pass approach compared to generic comparison-based sorts. - Python: The 10x speedup over
list.sortmakes it a powerful alternative for heavy number crunching in pure Python environments. Whilenumpy.sortis faster (5.27s), it requires the NumPy dependency.radix256_sortprovides near-NumPy performance (7.61s) for standard lists, with the overhead largely due to the O(N) cost of converting Python lists to Rust vectors.
For detailed technical explanation of why this is so fast, see Technical Details.
License
Apache License 2.0
Copyright (c) 2025 Tane Channel Technology