# `tiny_id`
[](https://github.com/paulgb/tiny_id)
[](https://github.com/paulgb/tiny-id/actions/workflows/rust.yml)
[](https://crates.io/crates/tiny-id)
[](https://docs.rs/tiny-id/)
[](https://deps.rs/repo/github/paulgb/tiny_id)
`tiny_id` is a Rust library for generating non-sequential, *tightly-packed* short IDs.
Most other short ID generators just string together random digits. Due to the
[birthday problem](https://en.wikipedia.org/wiki/Birthday_problem), that approach
is prone to collisions. For example, a four-digit alphabetic code has a ~50% chance of
collision after 800 codes.
`tiny_id` uses a [linear congruential generator](https://en.wikipedia.org/wiki/Linear_congruential_generator)
to generate codes which do not collide while retaining only a small, constant-sized piece
of state. For the same four-digit alphabetic code, `tiny_id` has a 0% chance of collision until all 456,976 possible codes have been generated.
These codes are indended for use-cases where it's desirable to have short, human-readable
codes such that two codes generated in a row are no more likely to resemble each other than
codes that are not. It should *not* be used in cases where the codes need to be non-guessable.
They also do not guard against a [German tank problem](https://en.wikipedia.org/wiki/German_tank_problem)-type analysis by someone sufficiently motivated.
A sample of random short codes, generated by the [generate.rs](examples/generate.rs) example,
demonstrates what a sequence of these codes looks like:
```bash
paul:~/tiny_id$ target/debug/examples/generate 36 4 10
9chl
yqpv
z1rk
c18m
1tc5
2t22
ftgv
4yih
5nag
ioa0
```
## How to use it
### Basic use
```rust
use tiny_id::ShortCodeGenerator;
fn main() {
// The length of generated codes
let length: usize = 6;
// Create a generator. The generator must be mutable, because each
// code generated updates its state.
let mut generator = ShortCodeGenerator::new_alphanumeric(length);
// Generate the next short code, and update the internal generator state.
let code = generator.next_string();
assert_eq!(length, code.len());
}
```
### Alphabets
The set of characters (or arbitrary values) used to construct a short code is referred
to as an *alphabet*. `tiny_id` has several built-in alphabets, or you can provide
your own.
If you provide an alphabet rather than use one of the built-in alphabets, that
alphabet must not contain any repeated entries. This is not enforced by the library,
but failure to abide will result in collisions.
```rust
use tiny_id::ShortCodeGenerator;
fn main() {
let length: usize = 6;
// There are several built-in alphabets with convenience constructors.
// Numeral digits (0-9), like "769458".
ShortCodeGenerator::new_numeric(length);
// Numeral digits and lowercase letters (a-z), like "l2sx2b".
ShortCodeGenerator::new_lowercase_alphanumeric(length);
// Numeral digits, lowercase, and uppercase letters, like "qAh6Gg".
ShortCodeGenerator::new_alphanumeric(length);
// Uppercase letters only, like "MEPQOD".
ShortCodeGenerator::new_uppercase(length);
// You can also provide an alphabet with any unicode characters.
// I hope you don't use it for emoji, but you could:
ShortCodeGenerator::with_alphabet(
"😛🐵😎".chars().collect(),
length
);
// The generator can also be used with non-char types, as long
// as they implement Copy.
let mut gen = ShortCodeGenerator::with_alphabet(
vec![true, false],
length
);
// next_string() is only implemented on ShortCodeGenerator<char>,
// but gen is a ShortCodeGenerator<bool>, so we need to call
// next_vec() instead, which returns a Vec<bool>.
let result: Vec<bool> = gen.next_vec();
assert_eq!(length, result.len());
}
```
### Exhaustion Strategies
Eventually, all short code generators reach a point where they run out of codes of
a given length. There are three options for what to do when this happens:
- **Increment the length**. This corresponds to `ExhaustionStrategy::IncreaseLength`,
which is the default.
- **Cycle**. This repeats the cycle of codes from the beginning. The order of codes
is the same in every cycle. Corresponds to `ExhaustionStrategy::Cycle`.
- **Panic**. In the spirit of [fail-fast](https://en.wikipedia.org/wiki/Fail-fast),
this panics when all codes have been used, for cases where exhaustion is unexpected
and assumed by the rest of the program not to happen. Corresponds to
`ExhaustionStrategy::Panic`.
```rust
use tiny_id::{ShortCodeGenerator, ExhaustionStrategy};
fn main() {
// Increase length (default).
let mut gen = ShortCodeGenerator::new_uppercase(2);
for _ in 0..(26*26) {
let result = gen.next_vec();
assert_eq!(2, result.len());
}
// We've exhausted all options, so our next code will be longer.
let result = gen.next_vec();
assert_eq!(3, result.len());
// Cycle.
let mut gen = ShortCodeGenerator::new_uppercase(2)
.exhaustion_strategy(ExhaustionStrategy::Cycle);
let first = gen.next_vec();
for _ in 0..(26*26-1) {
let result = gen.next_vec();
assert_eq!(2, result.len())
}
// We've exhausted all options, so our next code will be a
// repeat of the first.
let result = gen.next_vec();
assert_eq!(first, result);
}
```
## How it works
A [linear congruential generator](https://en.wikipedia.org/wiki/Linear_congruential_generator)
(LCG) is a simple, non-cryptographic pseudorandom number generator.
LCGs are interesting because they generate numbers in a cycle, and the length of that cycle
as a function of the parameters to the LCG is well-studied. In particular, one thing that's
well understood is how to make an LCG that generates the numbers 1..m with a cycle size of m,
i.e., to generate a permutation of the numbers 1..m. This is called the Hull-Dobell Theorem.
For an alphabet of size `N` and an ID length of `L`, there are `N ^ L` possible codes. We can
convert back and forth between the numbers `1 .. N^L` and those codes by treating the codes
as `base-N` representations of the numbers.
Combining these two facts, our approach is:
- Using the Hull-Dobell Theorem, construct an LCG such that it will “visit” every number
from `1` to `N^L` in some random(ish) order.
- Using the base conversion method, turn each of these numbers into a short ID.
## Notes
Note that the `ShortCodeGenerator` object itself contains a small amount of
state, which is updated every time a short code is generated. Short codes must
be generated by the same `ShortCodeGenerator` object in order to avoid collisions.
With the `serde` crate option, enabled by default, `ShortCodeGenerator` objects
can be serialized to any format supported by [serde](https://serde.rs/). This
can be used to persist the state of a generator for later use. (If you are using
a custom alphabet, the type of that alphabet must also be serializable.)
The total number of possible codes (alphabet size to the power of length) must
fit in a [`u64`](https://doc.rust-lang.org/std/primitive.u64.html). If you're working
with large enough codes and alphabets that that's a problem, you probably don't need
this library anyway (as random collisions will be more rare).
Randomness is only used during the construction of `ShortCodeGenerator`.
Code generation itself is entirely deterministic based on the current generator
state.
All operations use constant time and space, except for `ShortCodeGenerator`
construction. Construction technically has time complexity superlinear to the
cardinality of the alphabet provided. For reasonable alphabet sizes (say, <1000),
this should be negligible.
## Partitioning
If you need to generate short codes from multiple generators, you can use
the built-in partitioning feature to generate multiple short code generators
that generate codes from non-overlapping partitions of the code space.
Internally, this works by creating `num_partitions` clones of the generator,
“burning” a unique number from `0..num_partitions` codes on each one, and then
setting each one to skip past `num_partitions - 1` codes for every code it
generates.
Once a `ShortCodeGenerator` is partitioned, it can't be partitioned again or
repartitioned with a different number of partitions. Once partitioned, generators
may be serialized to be sent to a different machine or stored for later.
```rust
use tiny_id::{ShortCodeGenerator, ExhaustionStrategy};
fn main() {
// This returns a vector of two short code generators.
let mut generators: Vec<ShortCodeGenerator<_>> = ShortCodeGenerator::new_uppercase(5)
.into_partitioned_generators(2);
// Codes generated by the two generators will never overlap, because
// they come from separate code spaces.
assert_ne!(generators[0].next_string(), generators[1].next_string());
}
```
## Alternatives
- If you just want unique identifiers and don't care about how long they are,
use [uuid](https://github.com/uuid-rs/uuid). It has the advantage that it
does not need any state.
- If you are already assigning unique integers (e.g. an `AUTO INCREMENT` in SQL)
and don't care if they are sequential, you could just convert those to and from
strings using base conversion.