Skip to main content

orasort/
lib.rs

1//! # Orasort
2//!
3//! `orasort` is a high-performance, cache-efficient sorting library designed specifically for
4//! strings, byte arrays, and other data types that share common prefixes.
5//!
6//! It implements the [**Orasort**](https://patents.google.com/patent/US7680791B2) algorithm, which combines the strengths of **Quicksort**
7//! and **Radix Sort** while optimizing for modern CPU architectures by minimizing memory accesses
8//! and maximizing cache locality.
9//!
10//! ## Key Features
11//!
12//! - **Cache Locality**: Stores an 8-byte prefix of the key directly in the sort pointer, allowing
13//!   many comparisons to be resolved using only CPU registers without fetching the full data from memory.
14//! - **Adaptive Strategy**: Dynamically switches between Quicksort (for small partitions) and
15//!   Radix Sort (for large partitions) to maintain optimal performance across various distributions.
16//! - **Zero-Copy abstractions**: The [`KeyAccessor`] trait allows sorting arbitrary data structures
17//!   (e.g., Arrow arrays, `Vec<Vec<u8>>`) without copying the underlying data.
18//! - **In-Place Mutation**: Provides [`orasort_mut`] for sorting `Vec`s in-place with minimal allocation.
19//!
20//! ## Usage
21//!
22//! ### Basic Usage
23//!
24//! For standard collections like `Vec<String>` or `Vec<Vec<u8>>`, you can use [`orasort`] (index-based)
25//! or [`orasort_mut`] (in-place).
26//!
27//! ```rust
28//! use orasort::orasort_mut;
29//!
30//! let mut data = vec!["banana", "apple", "cherry", "date"];
31//! orasort_mut(&mut data);
32//!
33//! assert_eq!(data, vec!["apple", "banana", "cherry", "date"]);
34//! ```
35//!
36//! ### Custom Types
37//!
38//! To sort custom types or complex data structures without creating intermediate strings,
39//! implement the [`KeyAccessor`] trait.
40//!
41//! ```rust
42//! use orasort::{orasort, KeyAccessor};
43//!
44//! struct User {
45//!     username: String,
46//! }
47//!
48//! // Wrapper struct to avoid orphan rule violation (impl foreign trait on foreign type).
49//! struct Users(Vec<User>);
50//!
51//! impl KeyAccessor for Users {
52//!     fn get_key(&self, index: usize) -> &[u8] {
53//!         self.0[index].username.as_bytes()
54//!     }
55//!
56//!     fn len(&self) -> usize {
57//!         self.0.len()
58//!     }
59//! }
60//!
61//! let users = Users(vec![
62//!     User { username: "Alice".to_string() },
63//!     User { username: "Bob".to_string() },
64//! ]);
65//!
66//! // Returns indices: [0, 1] (Alice, Bob)
67//! let indices = orasort(&users);
68//! ```
69//!
70//! ## Performance Characteristics
71//!
72//! - **Best Case**: O(N) when keys are distinct and distinguishable by their prefixes.
73//! - **Worst Case**: O(N log N) similar to Quicksort, but heavily optimized for common prefix handling.
74//! - **Memory Overhead**: Creates a temporary vector of pointers (`16 bytes` per item) to perform the sort.
75//!
76//! This library is particularly effective for datasets where cache misses are the primary bottleneck,
77//! such as sorting large arrays of data.
78
79pub mod algo;
80pub mod core;
81pub use algo::{orasort, orasort_from_indices, orasort_mut, orasort_slice};
82pub use core::KeyAccessor;
83pub use core::SPLICE_PREFIX_SIZE;
84
85pub mod prelude {
86    //! Prelude for Orasort.
87
88    pub use crate::algo::{orasort, orasort_from_indices, orasort_mut};
89    pub use crate::core::{KeyAccessor, SPLICE_PREFIX_SIZE};
90}