1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
// Copyright 2024 Magnus Hovland Hoff.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// https://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or https://opensource.org/license/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! Id30 is an encoding scheme for 30 bit identifiers that look like the
//! following: `bpv3uq`, `zvaec2`, `rfmbyz`, `jwygvk`, `000000`, `zzzzzz`. It is
//! designed for use as opaque identifiers in URLs, that can be read and written
//! comfortably, and is a good choice for user-facing IDs when you don't need a
//! key space of more than 30 bits, giving you just over 10⁹ different IDs. For
//! comparison, YouTube surpassed this only in 2019, and is estimated to be
//! hosting about 4*10⁹ videos in 2024. User-facing IDs should generally be
//! chosen such that they do not reveal an underlying sequence, and, indeed,
//! Id30 looks best for randomly generated IDs.
//!
//! In machine readable form, Id30 is represented as a 32 bit integer, of which
//! the two most significant bits are always zero. This makes `u32` and `i32`
//! equally well suited.
//!
//! This crate defines the type [`Id30`] which implements this encoding scheme
//! through various traits. To avoid introducing excessive dependencies, several
//! of these trait implementations are opt-in via feature selection, see
//! [Features](#features) and [`Id30`] for details.
//!
//! The crate also provides a command line utility, installable with cargo
//! (`cargo install id30 --features=rand_std`), for converting between
//! integral and Id30 representations, and for generating new IDs on the command
//! line.
//!
//! # Id30 Encoding
//! The Id30 encoding is a case-insensitive base 32 encoding that handles some
//! confusable characters to compensate for some common misreadings and
//! mishearings. The canonical encoding alphabet is `0`, `1`, `2`, `3`, `4`,
//! `5`, `6`, `7`, `8`, `9`, `a`, `b`, `c`, `d`, `e`, `f`, `g`, `h`, `j`, `k`,
//! `m`, `n`, `p`, `q`, `r`, `t`, `u`, `v`, `w`, `x`, `y`, `z`, i.e. the digits
//! and then most, but not all, of the characters of the English alphabet. Id30
//! decoding handles case-insensitivity and confusable characters by mapping
//! multiple input characters to the same values.
//!
//! The confusable characters handled by Id30 are:
//!  - `0` and `o`/`O`, for visual similarity
//!  - `1`, `i`/`I` and `l`/`L`, for visual similarity
//!  - `s`/`S` and `f`/`F`, for verbal similarity
//!
//! ## Encoding
//! The algorithm for encoding a 30 bit integer `i` is:
//!  1. Group the integer in blocks of five bits, i.e. `let blocks = [(i >> 25)
//!     & 0x1f, (i >> 20) & 0x1f, (i >> 15) & 0x1f, (i >> 10) & 0x1f, (i >> 5)
//!     & 0x1f, i & 0x1f]`
//!  2. Starting with the most significant group of bits, map each group
//!     according to the canonical encoding alphabet, i.e. `let id30: String =
//!     blocks.iter().map(|x| ENCODING[x]).collect()`
//!
//! This yields a string of six characters. An Id30 string is always six
//! characters long, it is not valid to strip leading zeros as is normally done
//! for regular base 10 numbers.
//!
//! This encoding scheme ensures that the numerical ordering of the 30 bit
//! integers is the same as the lexicographical ordering of the encoded strings.
//!
//! ## Decoding
//! Decoding is more involved because of error handling and resolution of
//! alternative encodings.
//!
//! The algorithm for decoding an Id30 string to a 30 bit integer is:
//!  1. The given string is six characters long. Map each character according to
//!     the reverse mapping of the encoding. Additionally, map upper-case
//!     letters like their lower-case equivalents, and map the following
//!     confusable characters:
//!      * `o`, `O` → 0 (like `0`)
//!      * `i`, `I`, `l`, `L` → 1 (like `1`)
//!      * `s`, `S` → 15 (like `f`)
//!
//!     Characters that match the canonical encoding are mapped directly to the
//!     corresponding value, eg `'1' → 1`. For alternative encodings, either
//!     upper case or one of the confusables, add an additional bit (`1 << 5 ==
//!     0x20`) to signify that the input was non-canonical, eg `'I' → 1 + 0x20`.
//!     Map all other characters to a sentinel value (`1 << 6 == 0x40`) to
//!     signify decoding error, eg `'!' → 0x40`.
//!
//!     This can be encoded in a lookup table, so, given the input `id30:
//!     [u8; 6]`, we can implement this step as
//!     ```
//!     # const DECODING: [u32; 256] = [0; 256];
//!     # let id30: [u8; 6] = [0; 6];
//!     let decoded: Vec<_> = id30.iter().map(|&c| DECODING[c as usize]).collect();
//!     ```
//!  2. If any of the mapped values are the error sentinel, terminate decoding
//!     and yield an error result.
//!  3. If any of the mapped values have the alternative encoding bit set, let
//!     `is_canonical = false`. Otherwise, `is_canonical = true`.
//!  4. Compose the identifier value from the low five bits of all the mapped
//!     values:
//!     ```
//!     # let decoded = vec![0, 1, 2, 3, 4, 5];
//!     let masked: Vec<_> = decoded.iter().map(|x| x & 0x1f).collect();
//!     let value = (masked[0] << 25) + (masked[1] << 20) + (masked[2] << 15) +
//!         (masked[3] << 10) + (masked[4] << 5) + masked[5];
//!     ```
//!     Yield the composed value along with `is_canonical`.
//!
//! For use in URLs, a request to a non-canonical Id30 should be redirected to
//! the canonical encoding of the same Id30, to avoid exposing the same resource
//! at multiple different URLs. This is valuable for caching purposes and search
//! engine ranking. The redirection logic only needs to consider `is_canonical`
//! and does not need any costly operations such as querying a database, since
//! redirection can be done regardless of whether or not the target URL resolves
//! to anything sensible.
//!
//! ## Features
//! This crate uses features for selection of integrations with other crates.
//! Keeping the feature selection minimal serves to minimize the dependency
//! tree.
//!
//! To be able to support different major versions of third party crates, the
//! exposed features are named for the dependency version. For example, to
//! integrate with the `rand` crate in version 0.8.z, use the feature `rand08`.
//! This scheme allows future versions of `id30` to offer integrations with
//! multiple versions of `rand` simultaneously.
//!
//! As a convenience, there are unversioned aliases of each integration that
//! point to the latest supported major version of the given crate. For example,
//! using the feature `rand` enables integration with `rand` at the currently
//! supported latest version. `id30` does not consider it a breaking change to
//! update these aliases to point to newer versions, so you may want to consider
//! using the versioned feature names instead.
//!
//! There is one default feature, `rand`.
//!
//! The available integration features are:
//!  - `rand08` (alias `rand`), for integration with `rand` 0.8.z
//!  - `serde1` (alias `serde`), for integration with `serde` 1.y.z
//!  - `diesel2` (alias `diesel`), for integration with `diesel` 2.y.z
//!
//! See [`Id30`] for details about each integration.

#![warn(missing_docs)]
#![cfg_attr(
    any(
        feature = "unstable_portable_simd",
        feature = "unstable_stdarch_x86_avx512"
    ),
    feature(portable_simd)
)]
#![cfg_attr(
    any(feature = "unstable_stdarch_x86_avx512"),
    feature(stdarch_x86_avx512)
)]

mod codec_tables;
mod diesel_support;
mod display;
mod from;
mod from_str;
mod id30_parse;
mod rand;
mod serde_support;

use std::fmt::Debug;

pub use from::OutOfRangeError;
pub use id30_parse::{Id30Parse, ParseError};

#[cfg(feature = "diesel2")]
use diesel2 as diesel;
#[cfg(feature = "diesel2")]
use diesel2::sql_types::Integer;

/// An implementation of the Id30 encoding scheme as documented at the [crate]
/// root.
///
/// This type holds a 30 bit integer as a 32 bit integer, of which the two most
/// significant bits are always zero. The type has the same representation as
/// `u32`. Debug-formatting includes both the Id30 encoding and the integer
/// representation.
///
/// Ordering, as implemented for the `PartialOrd` and `Ord` traits, orders by
/// numerical value of the underlying integer. This is the same order as the
/// lexicographical ordering of the string representation.
///
/// There are many ways to create instances of `Id30`:
///  - Via the `TryFrom` trait from either a `u32` or an `i32`:
///     ```rust
///     # use id30::Id30;
///     let id: Id30 = 1234.try_into().expect("1234 is in range");
///     assert_eq!(Id30::try_from(1 << 31), Err(id30::OutOfRangeError));
///     ```
///  - Parsing, via the `FromStr` trait:
///     ```rust
///     # use id30::Id30;
///     let id: id30::Id30 = "hrga2q".parse().unwrap();
///     ```
///     **Tip:** Parse to the [`Id30Parse`] type if you need to know whether the
///     text representation is canonical or alternate.
///  - With feature `rand`, via the [`Distribution`][rand08::distributions::Distribution]
///     trait:
///     ```rust
///     # use id30::Id30;
///     # use rand08 as rand;
///     use rand::{distributions::Standard, prelude::*};
///     let mut rng = rand::thread_rng();
///
///     let id: Id30 = rng.gen();
///     let ids: Vec<Id30> = Standard.sample_iter(rng).take(10).collect();
///     ```
///  - With feature `serde`, via deserialization
///  - With feature `diesel`, as output from queries
///
/// An `Id30` can be converted directly to `u32` and `i32` via the `From` trait:
/// ```rust
/// # use id30::Id30;
/// let id: Id30 = 1234.try_into().unwrap();
/// assert_eq!(u32::from(id), 1234);
/// ```
///
/// To generate the text representation:
///  - format an `Id30` value via the `Display` trait:
///     ```rust
///     # use id30::Id30;
///     let id: id30::Id30 = "j9yceq".parse().unwrap();
///     assert_eq!(&id.to_string(), "j9yceq");
///     assert_eq!(&format!("/path/to/{id}"), "/path/to/j9yceq");
///     ```
///  - or, with feature `serde`, via serialization
///
/// # Integrations with other crates
///  - crate `rand` via feature `rand08` (alias `rand`):
///
///     `Id30` implements [`Distribution`], enabling generation of random `Id30`
///     values.
///
///  - crate `serde` via feature `serde1` (alias `serde`):
///
///     `Id30` implements [`Serialize`] and [`Deserialize`], enabling
///     serialization and deserialization of `Id30` values as text through
///     serde.
///
///  - crate `diesel` via feature `diesel2` (alias `diesel`):
///
///     `Id30` implements [`FromSql`] and [`Queryable`], enabling
///     deserialization of integers from the database as `Id30` values, and
///     [`ToSql`], enabling serialization of `Id30` values as integers in the
///     database. Additionally, [`AsExpression`] is implemented, enabling the
///     usage of `Id30` values in diesel query builder expressions.
///
/// [`Distribution`]: rand08::distributions::Distribution
///
/// [`Serialize`]: serde1::ser::Serialize
/// [`Deserialize`]: serde1::de::Deserialize
///
/// [`FromSql`]: diesel2::deserialize::FromSql
/// [`Queryable`]: diesel2::deserialize::Queryable
/// [`ToSql`]: diesel2::serialize::ToSql
/// [`AsExpression`]: diesel::expression::AsExpression

#[repr(transparent)]
#[derive(PartialEq, Eq, PartialOrd, Ord, Hash, Copy, Clone)]
#[cfg_attr(feature = "diesel2", derive(diesel::AsExpression, diesel::FromSqlRow))]
#[cfg_attr(feature = "diesel2", diesel(sql_type = Integer))]
pub struct Id30(u32);

impl Debug for Id30 {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("Id30")
            .field("id30", &self.to_string())
            .field("u32", &self.0)
            .finish()
    }
}