smallperm
High-performance pseudo-random permutations using Feistel networks.
Overview
smallperm generates pseudo-random permutations with O(1) memory and O(1) time per element. Unlike Fisher-Yates shuffle which requires O(n) memory to store the entire permutation, this library computes each element on-demand using a Feistel network cipher.
This makes it ideal for:
- Machine learning: Shuffling billions of training samples without memory overhead
- Distributed systems: Each worker can independently compute any slice of the permutation
- Streaming: Get shuffled elements immediately without waiting for full materialization
Features
- O(1) memory: No need to store the entire permutation
- O(1) random access: Get any element instantly via index
- Invertible: Bidirectional mapping between indices and values
- Deterministic: Same seed always produces the same permutation
- Large scale: Supports up to 2^128 elements
Installation
Add to your Cargo.toml:
[]
= "0.1"
Quick Start
use Permutation;
// Create a permutation of 1 million elements
let perm = new;
// Iterate through shuffled indices
for value in perm.iter.take
// O(1) random access - get the 500,000th element
let value = perm.get.unwrap;
// O(1) inverse lookup - find where a value appears
let index = perm.index_of.unwrap;
assert_eq!;
API
Permutation
The main type for working with permutations.
use Permutation;
// Create with a u64 seed
let perm = new;
// Or with a 256-bit key for more control
let key: = ;
let perm = new_with_key;
// Get length
assert_eq!;
// Random access (returns Option<u128>)
let value = perm.get.unwrap;
// Inverse lookup (returns Option<u128>)
let index = perm.index_of.unwrap;
// Iterate
for value in perm.iter
Helper Functions
Convenience functions for common operations:
use ;
// Sample k unique random indices from [0, n)
let indices = sample_indices;
assert_eq!;
// Sample k unique elements from a slice
let data = vec!;
let sampled = sample;
// Shuffle a slice (returns new Vec)
let shuffled = shuffle;
// Shuffle in place
let mut data = vec!;
shuffle_in_place;
Low-Level Access
For advanced use cases, the underlying types are also exposed:
use ;
// Permutor is the core iterator type
let permutor = new_with_u64_key;
for value in permutor
// FeistelNetwork is the raw cipher
// (outputs may exceed max, use Permutor for bounded results)
Performance
The library uses a Feistel network with FxHash (a fast, non-cryptographic hash function) as the round function. Each lookup requires 8-32 hash operations depending on the permutation size.
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Create permutation | O(1) | O(1) |
| Get element by index | O(1)* | O(1) |
| Inverse lookup | O(1)* | O(1) |
| Iterate n elements | O(n) | O(1) |
* Amortized O(1) due to rejection sampling when the Feistel domain exceeds the permutation size.
Comparison with Fisher-Yates
| Approach | Memory | Time to first element | Random access |
|---|---|---|---|
| Fisher-Yates | O(n) | O(n) | O(1) after shuffle |
| smallperm | O(1) | O(1) | O(1) |
For a dataset of 1 billion elements:
- Fisher-Yates: ~8 GB memory, must shuffle everything first
- smallperm: ~100 bytes, instant access to any element
Use Cases
ML Data Loading
use Permutation;
// Stream shuffled indices without loading everything into memory
for sample_idx in load_epoch.take
Distributed Training
use Permutation;
Reproducible Sampling
use sample;
let data: = .collect;
// Same seed = same sample, every time
let sample1 = sample;
let sample2 = sample;
assert_eq!;
Algorithm
The library implements a Feistel network, a symmetric cipher structure that creates a bijection (one-to-one mapping) over the input domain. Key properties:
- Bijective: Every input maps to exactly one output, and vice versa
- Invertible: Given the output, we can compute the original input
- Pseudo-random: Output appears random but is deterministic given the key
For permutations of size n, the Feistel network operates on a domain of size 2k where k is the smallest even integer such that 2k >= n. Values outside [0, n) are handled via rejection sampling (recursively applying the permutation until the result is in range).
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Based on permutation-iterator by Asim Ihsan.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.