# biski64: Fast and Robust Pseudo-Random Number Generator
This repository contains `biski64`, an extremely fast pseudo-random number generator (PRNG) with a guaranteed minimum period of 2^64. It is designed for non-cryptographic applications where speed and statistical quality are important.
The library is available on [crates.io](https://crates.io/crates/biski64) and the documentation can be found on [docs.rs](https://docs.rs/biski64).
## Features
* **High Performance:** - Significantly faster than standard library generators and modern high-speed PRNGs like `xoroshiro128++` and `xoshiro256++`.
* **Exceptional Statistical Quality:** - Easily passes BigCrush and terabytes of PractRand. [Scaled down versions](#scaled-down-testing) show even better mixing efficiency than well respected PRNGs like JSF.
* **Guaranteed Minimum Period:** - Incorporates a 64-bit Weyl sequence to ensure a minimum period of 2^64.
* **Proven Injectivity:** - Invertible algorithm with proven injectivity via Z3 Prover.
* **Rust Ecosystem Integration:** - The library is `no_std` compatible and implements the standard `RngCore` and `SeedableRng` traits from `rand_core` for easy use.
* **C Integration:** - Only requires stdint.h.
## Rust Installation
Add `biski64` and `rand` to your `Cargo.toml` dependencies:
```toml
[dependencies]
biski64 = "0.3.2"
rand = "0.9"
```
### Basic Usage
```rust
use rand::{RngCore, SeedableRng};
use biski64::Biski64Rng;
let mut rng = Biski64Rng::seed_from_u64(12345);
let num = rng.next_u64();
```
## Performance
* **Rust:**
```
biski64 0.401 ns/call
xoshiro256++ 0.610 ns/call
xoroshiro128++ 0.883 ns/call
```
* **C:**
```
biski64 0.368 ns/call
wyrand 0.368 ns/call
sfc64 0.368 ns/call
xoshiro256++ 0.552 ns/call
xoroshiro128++ 0.732 ns/call
PCG128_XSL_RR_64 1.197 ns/call
```
* **Java:**
```
biski64 0.491 ns/call
xoshiro256++ 0.739 ns/call
xoroshiro128++ 0.790 ns/call
ThreadLocalRandom 0.846 ns/call
Java.util.Random 5.315 ns/call
```
*Tested on an AMD Ryzen 9 7950X3D*
## Rust Algorithm
```rust
// In the actual implementation, these are fields of the Biski64Rng struct.
let (mut fast_loop, mut mix, mut loop_mix) = (0u64, 0u64, 0u64);
#[inline(always)]
pub fn next_u64(&mut self) -> u64 {
// The output is calculated from the current state.
let output = self.mix.wrapping_add(self.loop_mix);
(self.fast_loop, self.mix, self.loop_mix) = (
self.fast_loop.wrapping_add(0x9999999999999999),
self.mix.rotate_left(16).wrapping_add(self.loop_mix.rotate_left(40)),
self.fast_loop ^ self.mix,
);
output
}
```
## C Algorithm
```c
// Helper for rotation
static inline uint64_t rotateLeft(const uint64_t x, int k) {
return (x << k) | (x >> (64 - k));
}
uint64_t biski64() {
uint64_t output = mix + loopMix;
uint64_t oldLoopMix = loopMix;
loopMix = fast_loop ^ mix;
mix = rotateLeft(mix, 16) + rotateLeft(oldLoopMix, 40);
fast_loop += 0x9999999999999999;
return output;
}
```
## Java Algorithm
```java
public long nextLong() {
final long output = mix + loopMix;
final long oldLoopMix = loopMix;
loopMix = fastLoop ^ mix;
mix = Long.rotateLeft( mix, 16 ) + Long.rotateLeft( oldLoopMix, 40 );
fastLoop += 0x9999999999999999L;
return output;
}
```
*(Note: See test files for full seeding and usage examples.)*
## Parallel Streams
`biski64` is well-suited for parallel applications, and parallel streams can be implemented as follows:
* Randomly seed `mix`, and `loop_mix` for each stream as normal.
* To ensure maximal separation between sequences, space starting values for each streams' `fast_loop`:
```
uint64_t cycles_per_stream = (2^64 - 1 ) / num_streams;
fast_loop = i * cycles_per_stream * 0x9999999999999999ULL;
```
*where i is the stream index (0, 1, 2, ...) as demonstrated in the C, Rust and Java example code*
## Scaled Down Testing
A key test for any random number generator is to see how it performs when its internal state is drastically reduced. This allows for practical testing of the core mixing algorithm. `biski64` performs exceptionally well in this regard.
| 8-bit | 24 bits | [2^22 bytes (4 MB)](https://github.com/danielcota/biski64/blob/main/tests/practrand_8bit_out.txt)|
| 16-bit | 48 bits | [2^40 bytes (1 TB)](https://github.com/danielcota/biski64/blob/main/tests/practrand_16bit_out.txt) |
| 32-bit | 96 bits | [did not fail, tested to 2^47 bytes (128 TB)](https://github.com/danielcota/biski64/blob/main/tests/practrand_32bit_out.txt) |
| 64-bit | 192 bits | [did not fail, tested to 2^46 bytes (64 TB)](https://github.com/danielcota/biski64/blob/main/tests/practrand_64bit_out.txt) |
The results for the 8-bit and 16-bit scaled down versions show that `biski64` exceeds the mixing efficiency (in terms of PractRand bytes passed per total state size) of even the [well respected and tested JSF PRNG](https://www.pcg-random.org/posts/bob-jenkins-small-prng-passes-practrand.html).
For comparison:
| biski(8-bit) | 24 bits | 2^22 bytes| **91.7%** |
| jsf8 | 32 bits | 2^28 bytes | 87.5% |
| biski(16-bit) | 48 bits | 2^40 bytes| **83.3%** |
| jsf16 | 64 bits | 2^47 bytes | 73.4% |
The scaled down test results for `biski64` indicate strong performance of its core mixing algorithm (suggesting that the strong performance at full 64-bit size is not a result simply of a large state, but of its robust mixing performance).
## BigCrush Comparison
BigCrush was run 100 times on `biski64` (and multiple reference PRNGs).
Assuming test failure with a p-value below 0.001 or above 0.999 implies a 0.2% probability of a single test yielding a "false positive" purely by chance with an ideal generator. Running BigCrush 100 times (for a total of 25400 sub-tests), around 50.8 such chance "errors" would be anticipated.
```
biski64, 51 failed subtests (out of 25400 total)
5 subtests failed twice
wyrand, 55 failed subtests (out of 25400 total)
1 subtest failed THREE times
5 subtests failed twice
sfc64, 70 failed subtests (out of 25400 total)
1 subtest failed THREE times
12 subtests failed twice
xoroshiro128++, 54 failed subtests (out of 25400 total)
1 subtest failed FOUR times
4 subtests failed twice
xoroshiro256++, 60 failed subtests (out of 25400 total)
1 subtest failed THREE times
5 subtests failed twice
pcg128_xsl_rr_64, 47 failed subtests (out of 25400 total)
1 subtest failed FIVE times
4 subtests failed twice
```
*(Note: For an ideal random generator, seeing three or more failures for a specific subtest would not be expected.)*
## Design
Motivated by M.E. O'Neill's post, [Does It Beat the Minimal Standard](https://www.pcg-random.org/posts/does-it-beat-the-minimal-standard.html) - the initial design for `biski64` used a scaled down version with 8-bit state variables. This allowed for fast iteration using PractRand.
Once the algorithm was settled on, parameters were selected as follows:
1. **Additive Constant** - Multiple additive constants were tested using a 16-bit state variable version. For each constant all possible rotation constants r1 and r2 (r1 < r2) were iterated over and PractRand was run to determine how far each (additiveConstant, r1, r2) variant could get. The highest performing constant was 0x9999 - passing the most rotational pairs at the highest level. This constant was then scaled to 0x9999999999999999 in the full 64-bit version.
2. **Rotation Constants** - It was observed that bitsPerStateVar / 4 showed up frequently at the highest level for r1, as well as integers near bitsPerStateVar * (ɸ - 1) for r2. This was then used as a heuristic for rotation constants in the 32-bit and 64-bit versions.
3. **BigCrush Confirmation** - BigCrush was run 100 times on five selected full size 64-bit variants (using the above heuristics). The best observed r2 values were 39 and 40 - supporting the theory that r1=bits/4 and r2=bits * (ɸ - 1) were solid heuristics. r2=40 was then deemed the best (as it exhibited no subtests failing three times) - and then used for the full 64-bit version.
```
(r1=16, r2=37) 64 subtests failed (out of 25400 total)
2 subtests failed THREE times
7 subtests failed twice
(r1=16, r2=39) 46 subtests failed (out of 25400 total)
1 subtest failed THREE times
4 subtests failed twice
(r1=16, r2=40) 51 subtests failed (out of 25400 total)
5 subtests failed twice
(r1=16, r2=41) 60 subtests failed (out of 25400 total)
1 subtest failed THREE times
6 subtests failed twice
(r1=16, r2=43) 46 subtests failed (out of 25400 total)
1 subtest failed FOUR times
1 subtest failed THREE times
1 subtest failed twice
```
## Notes
Created by Daniel Cota and named after his cat Biscuit - a small and fast Egyptian Mau.