# suffixsort
A high-performance Rust library for inverse lexicographic (suffix) sorting, providing both high-level processing utilities and low-level comparison functions.
## Features
- **Inverse Lexicographic Sorting**: Compare strings from the last character towards the first
- **Flexible Configuration**: Multiple sorting modes including dictionary order, case insensitivity, and reverse sorting
- **Unicode NFC Normalization**: Optional Unicode normalization for consistent sorting of equivalent sequences
- **High Performance**: Parallel processing using Rayon for handling large datasets efficiently
- **Dual API**: Both high-level line processing and low-level comparator functions
- **Zero-Cost Abstractions**: Minimal performance overhead through Rust's zero-cost abstractions
- **Optimized Defaults**: Fastest performance with all options disabled by default
## Performance Characteristics
suffixsort is designed for maximum performance with all options disabled (default configuration). The library is extremely fast by default, processing millions of lines per second on modern hardware.
### Performance-Neutral Options
These options have minimal impact on performance:
- `reverse`: Simply inverts comparison results
- `right_align`: Adds padding during output formatting only
- `exclude_no_word`: Simple filtering during processing
- `word_only`: Affects output formatting only
### Performance-Impactful Options
These options may reduce performance when enabled:
- `normalize`: Unicode NFC normalization adds processing overhead during key extraction
- `stable`: Stable sorting algorithms are generally slower than unstable variants
- `ignore_case`: Case folding during key extraction adds minor overhead
- `dictionary_order`: More complex key extraction logic
- `use_entire_line`: Simpler key extraction but may use more memory
For maximum throughput with large datasets, use the default configuration (all options disabled).
## Project Structure
This library is part of a workspace that includes:
- `core/`: The suffixsort library crate (this crate)
- `cli/`: The ssort command-line interface that uses this library
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
suffixsort = ">=0.1"
```
## Usage
### High-Level API
The main entry point is the `SortConfig` struct which allows you to configure and execute the sorting process:
```rust
use suffixsort::{SortConfig, ProcessedLine};
let config = SortConfig {
ignore_case: true,
reverse: false,
dictionary_order: true,
// ... other configuration options
..Default::default()
};
let lines = vec![
"Apple".to_string(),
"banana".to_string(),
"Cherry".to_string(),
];
let (processed, padding_info) = config.process_lines(lines);
for line in processed {
println!("{}", line.original);
}
```
### Low-Level API
For advanced use cases, you can use the comparator function directly:
```rust
use suffixsort::SortConfig;
use std::cmp::Ordering;
let config = SortConfig {
ignore_case: true, // Note: this does not affect the low-level comparator!
reverse: false,
..Default::default()
};
// The low-level comparator does not apply ignore_case or normalization.
// Users must pre-process strings if needed.
let comparer = config.get_comparer();
let mut words = vec!["Banana", "apple", "Cherry"];
// Use with standard sort
words.sort_by(|a, b| comparer(a, b));
// Or with parallel sort (requires Rayon)
use rayon::prelude::*;
## Configuration Options
The `SortConfig` struct provides these options:
- `ignore_case`: Case-insensitive comparison (minimal performance impact, applied during key extraction)
- `use_entire_line`: Use entire line instead of first word for sorting (simpler but may use more memory)
- `dictionary_order`: Ignore non-alphabetic characters when finding first word (performance impact)
- `reverse`: Reverse the sort order (performance-neutral)
- `stable`: Use stable sorting algorithm (performance impact)
- `right_align`: Right-align output with padding (performance-neutral)
- `exclude_no_word`: Exclude lines without words (performance-neutral)
- `word_only`: Output only the word used for sorting (performance-neutral)
- `normalize`: Normalize Unicode to NFC form (performance impact)
## Performance
The library is designed for high performance with large datasets:
- Parallel processing using Rayon's work-stealing scheduler
- Zero-copy operations where possible
- Efficient character-by-character comparison
- Minimal memory allocation
- Optimized defaults for maximum throughput
## Examples
See the `cli` directory for a complete command-line implementation using this library.
## License
MIT OR Apache-2.0