Expand description
Id30 is an encoding scheme for 30 bit identifiers that look like the
following: bpv3uq
, zvaec2
, rfmbyz
, jwygvk
, 000000
, zzzzzz
. It is
designed for use as opaque identifiers in URLs, that can be read and written
comfortably, and is a good choice for user-facing IDs when you don’t need a
key space of more than 30 bits, giving you just over 10⁹ different IDs. For
comparison, YouTube surpassed this only in 2019, and is estimated to be
hosting about 4*10⁹ videos in 2024. User-facing IDs should generally be
chosen such that they do not reveal an underlying sequence, and, indeed,
Id30 looks best for randomly generated IDs.
In machine readable form, Id30 is represented as a 32 bit integer, of which
the two most significant bits are always zero. This makes u32
and i32
equally well suited.
This crate defines the type Id30
which implements this encoding scheme
through various traits. To avoid introducing excessive dependencies, several
of these trait implementations are opt-in via feature selection, see
Features and Id30
for details.
The crate also provides a command line utility, installable with cargo
(cargo install id30 --features=rand_std
), for converting between
integral and Id30 representations, and for generating new IDs on the command
line.
§Id30 Encoding
The Id30 encoding is a case-insensitive base 32 encoding that handles some
confusable characters to compensate for some common misreadings and
mishearings. The canonical encoding alphabet is 0
, 1
, 2
, 3
, 4
,
5
, 6
, 7
, 8
, 9
, a
, b
, c
, d
, e
, f
, g
, h
, j
, k
,
m
, n
, p
, q
, r
, t
, u
, v
, w
, x
, y
, z
, i.e. the digits
and then most, but not all, of the characters of the English alphabet. Id30
decoding handles case-insensitivity and confusable characters by mapping
multiple input characters to the same values.
The confusable characters handled by Id30 are:
0
ando
/O
, for visual similarity1
,i
/I
andl
/L
, for visual similaritys
/S
andf
/F
, for verbal similarity
§Encoding
The algorithm for encoding a 30 bit integer i
is:
- Group the integer in blocks of five bits, i.e.
let blocks = [(i >> 25) & 0x1f, (i >> 20) & 0x1f, (i >> 15) & 0x1f, (i >> 10) & 0x1f, (i >> 5) & 0x1f, i & 0x1f]
- Starting with the most significant group of bits, map each group
according to the canonical encoding alphabet, i.e.
let id30: String = blocks.iter().map(|x| ENCODING[x]).collect()
This yields a string of six characters. An Id30 string is always six characters long, it is not valid to strip leading zeros as is normally done for regular base 10 numbers.
This encoding scheme ensures that the numerical ordering of the 30 bit integers is the same as the lexicographical ordering of the encoded strings.
§Decoding
Decoding is more involved because of error handling and resolution of alternative encodings.
The algorithm for decoding an Id30 string to a 30 bit integer is:
-
The given string is six characters long. Map each character according to the reverse mapping of the encoding. Additionally, map upper-case letters like their lower-case equivalents, and map the following confusable characters:
o
,O
→ 0 (like0
)i
,I
,l
,L
→ 1 (like1
)s
,S
→ 15 (likef
)
Characters that match the canonical encoding are mapped directly to the corresponding value, eg
'1' → 1
. For alternative encodings, either upper case or one of the confusables, add an additional bit (1 << 5 == 0x20
) to signify that the input was non-canonical, eg'I' → 1 + 0x20
. Map all other characters to a sentinel value (1 << 6 == 0x40
) to signify decoding error, eg'!' → 0x40
.This can be encoded in a lookup table, so, given the input
id30: [u8; 6]
, we can implement this step aslet decoded: Vec<_> = id30.iter().map(|&c| DECODING[c as usize]).collect();
-
If any of the mapped values are the error sentinel, terminate decoding and yield an error result.
-
If any of the mapped values have the alternative encoding bit set, let
is_canonical = false
. Otherwise,is_canonical = true
. -
Compose the identifier value from the low five bits of all the mapped values:
let masked: Vec<_> = decoded.iter().map(|x| x & 0x1f).collect(); let value = (masked[0] << 25) + (masked[1] << 20) + (masked[2] << 15) + (masked[3] << 10) + (masked[4] << 5) + masked[5];
Yield the composed value along with
is_canonical
.
For use in URLs, a request to a non-canonical Id30 should be redirected to
the canonical encoding of the same Id30, to avoid exposing the same resource
at multiple different URLs. This is valuable for caching purposes and search
engine ranking. The redirection logic only needs to consider is_canonical
and does not need any costly operations such as querying a database, since
redirection can be done regardless of whether or not the target URL resolves
to anything sensible.
§Features
This crate uses features for selection of integrations with other crates. Keeping the feature selection minimal serves to minimize the dependency tree.
To be able to support different major versions of third party crates, the
exposed features are named for the dependency version. For example, to
integrate with the rand
crate in version 0.8.z, use the feature rand08
.
This scheme allows future versions of id30
to offer integrations with
multiple versions of rand
simultaneously.
As a convenience, there are unversioned aliases of each integration that
point to the latest supported major version of the given crate. For example,
using the feature rand
enables integration with rand
at the currently
supported latest version. id30
does not consider it a breaking change to
update these aliases to point to newer versions, so you may want to consider
using the versioned feature names instead.
There is one default feature, rand
.
The available integration features are:
rand08
(aliasrand
), for integration withrand
0.8.zserde1
(aliasserde
), for integration withserde
1.y.zdiesel2
(aliasdiesel
), for integration withdiesel
2.y.z
See Id30
for details about each integration.
Structs§
- Id30
- An implementation of the Id30 encoding scheme as documented at the crate root.
- Id30
Parse Id30Parse
represents the successful result of parsing an id30 string:- OutOf
Range Error - The given value was out of range for an
Id30
. The valid range is[0, 1 << 30)
.
Enums§
- Parse
Error - The given string was not a valid
id30