1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
//! Fast and efficient human-readable data encoding!
//!
//! Bunk encodes binary data as pronounceable gibberish, somewhat resembling Latin. This is useful when
//! binary data such as an encryption key is shown to an end-user who might need to manually transfer it.
//!
//! Using the default [settings](Settings), a string of 32 bytes gets encoded as:
//! ```text
//! atemorni telphocom neideu gepypi forzamar oasal cevanal butthepo aujoate turviy menkais
//! ```
//!
//! Optionally, Bunk can [decorate](Settings::decorate) the encoded string with commas, periods, and sentence
//! casing to improve readability:
//! ```text
//! Atemorni telphocom. Neideu gepypi forzamar oasal cevanal butthepo aujoate turviy, menkais.
//! ```
//!
//!
//! # Overview
//!
//! - It is fast! On my machine, encoding and then decoding a random array of 32 bytes takes an average of
//! ~0.8µs with the default settings --- allocations and all; no hidden fees.
//! - It is small! Bunk stores a table of only 256 syllables, each between 1-4 letters (average of 2.47), and
//! some data structures needed for fast lookup.
//! - Checksums of variable length can be added to encoded messages to verify data integrity when decoding,
//! which protects against typos.
//! - The [maximum word length](Settings::word_len) (in syllables) can be customized.
//!
//!
//! # How it compares to English dictionary encodings
//!
//! A popular scheme is to encode binary data as actual English words, which yields results that are more
//! readable and easier to remember. See [bip39](https://docs.rs/tiny-bip39/) as an example of this. However,
//! to be efficient in the amount of data a string of words can encode, a _massive_ table of (sometimes
//! quite long) words must be included --- [bip39](https://docs.rs/tiny-bip39/) uses 2048 words. In addition
//! to this, some kind of data structure for lookup is also needed, and will likely have to be constructed at
//! runtime. If this is of no object to your application, use something like
//! [bip39](https://docs.rs/tiny-bip39/) instead!
//!
//! Bunk takes a different approach, requiring a table of only 256 1-4 letter syllables, each carrying one
//! byte of data. This allows Bunk to:
//! - Take up less memory overall.
//! - Store data structures needed for fast lookup in static memory instead of having to construct it at
//! runtime.
//!
//!
//! # Serde
//!
//! Enable the `serde` feature and Bunk can be used to serialize/deserialize fields that implement
//! `AsRef<[u8]>` and `From<Vec<u8>>`:
//! ```text
//! #[derive(Serialize, Deserialize)]
//! struct Vault {
//! #[serde(with = "bunk")]
//! key: Vec<u8>,
//! name: String,
//! }
//! ```
//!
//! Note that the [settings](Settings) used when encoding for serde are necessarily hard-coded:
//! ```no_run
//! # use bunk::*;
//! # let _ =
//! Settings {
//! word_len: Some(3),
//! checksum: Checksum::Disabled,
//! decorate: false,
//! }
//! # ;
//! ```
//!
//!
//! # Examples
//!
//! Basic usage with default [settings](Settings):
//! ```
//! let encoded = bunk::encode(b"aftersun");
//! let decoded = bunk::decode(encoded)?;
//!
//! assert_eq!(decoded, b"aftersun");
//! # Ok::<(), bunk::InvalidData>(())
//! ```
//!
//! Disabled [checksum](Checksum):
//! ```
//! use bunk::{Checksum, Settings};
//!
//! let settings = Settings {
//! checksum: Checksum::Disabled,
//! ..Default::default()
//! };
//! let encoded = bunk::encode_with_settings(b"it's such a beautiful day", settings);
//! let decoded = bunk::decode_with_settings(encoded, settings.checksum)?;
//!
//! assert_eq!(decoded, b"it's such a beautiful day");
//! # Ok::<(), bunk::InvalidData>(())
//! ```
//!
//! Custom [checksum length](Checksum):
//! ```
//! use bunk::{Checksum, Settings};
//!
//! let settings = Settings {
//! checksum: Checksum::Length4,
//! ..Default::default()
//! };
//! let encoded = bunk::encode_with_settings([33, 14, 224, 134], settings);
//! let decoded = bunk::decode_with_settings(encoded, settings.checksum)?;
//!
//! assert_eq!(decoded, [33, 14, 224, 134]);
//! # Ok::<(), bunk::InvalidData>(())
//! ```
//!
//! Custom [word length limit](Settings::word_len):
//! ```
//! use bunk::{Checksum, Settings};
//!
//! let settings = Settings {
//! word_len: Some(5),
//! ..Default::default()
//! };
//! let encoded = bunk::encode_with_settings([231, 6, 39, 34], settings);
//! let decoded = bunk::decode(encoded)?; // word_len doesn't affect the decoder
//!
//! assert_eq!(decoded, [231, 6, 39, 34]);
//! # Ok::<(), bunk::InvalidData>(())
//! ```
//!
//!
//! # How it works
//!
//! To explain the algorithm, we'll iteratively build upon it and solve issues as we go.
//!
//! The fundamental idea is to encode a byte as a syllable by using it to index into a table of 256 unique
//! syllables, the result of which is then appended to the encoded string --- as one would expect. The
//! decoder can then use a [trie](https://en.wikipedia.org/wiki/Trie) to find the index of the longest
//! syllable at the beginning of the string, which corresponds to the encoded byte.
//!
//! This by itself causes issues of parser ambiguity when one valid syllable is a prefix of another. Take as
//! a basic example the encoded string "ous". Is this the single syllable "ous", or the syllable "o" followed
//! by "us"? Barring some cumbersome machinery, there is no way for the decoder to know! The encoder
//! therefore has to detect when such an ambiguity is possible by checking if the first letter of the second
//! syllable is a valid continuation of the first syllable. If so, it inserts a word break between them.
//! (Technically, this is stricter than necessary for breaking the ambiguity but is easy to check and allows
//! the decoder to be written greedily.)
//!
//! To support these two required operations --- finding the longest syllable prefixed to a string, and
//! checking whether a letter is a valid continuation of a syllable --- Bunk uses a trie. There are then two
//! issues presenting themselves:
//! - Tries are _slow_ to construct.
//! - There are (somehow) no efficient trie libraries for Rust that allows for these operations in their API.
//!
//! As a solution to both of these, a precomputed trie (as created by [crawdad](https://docs.rs/crawdad/)) is
//! stored in static memory, on top of which Bunk implements a basic traversal, which the only API needed for
//! the two operations. All in all, the trie API comes out to only about 60 lines of code --- much less than
//! having to add [crawdad](https://docs.rs/crawdad/) (or such) as a dependency.
//!
//! So far, the algorithm we've described is a perfectly functional encoder. However, to be more
//! user-friendly, we'd ideally also like _all_ inputs to yield equally pronounceable text. Without any
//! further measures, inputs such as `[0, 0, 0, 0]` yield repeated syllables, in this case "uuu u". To avoid
//! this, Bunk artificially increases the _apparent_ entropy of encoded bytes by first XORing them with a
//! value dependent on their index. Since XOR undoes itself, the decoder can then do the exact same thing and
//! retrieve the original bytes. With this in place, `[0, 0, 0, 0]` gets nicely encoded as "trirori mulry".
pub use *;
pub use *;
pub use *;
/// Specifies the number of checksum bytes used when encoding.
///
/// Default: [`Checksum::Length1`].
/// The FNV-1a hashing algorithm.
///
/// Implementation based on pseudo-code on
/// [Wikipedia](https://en.wikipedia.org/wiki/Fowler-Noll-Vo_hash_function). This is used for the checksum.
;
/// Increases _apparent_ entropy in input data.
///
/// Before getting the syllable corresponding to a byte, it along with its index is run through this function
/// to reduce visible patterns in the input data. This ensures that e.g. `[0, 0, 0, 0]` gets encoded as
/// `trirori mul` and not `uuu u`.
///
/// Some notes:
/// - This neither increases nor decreases security; it is completely transparent, and used only to make the
/// output look nicer.
/// - The transformation applied to bytes repeats every 256 indices.
/// - This function undoes itself if the index is the same; i.e., it both encodes and decodes bytes.
///
/// ```ignore
/// let input = 0xC5;
/// let encoded = running_code(input, 0);
/// let decoded = running_code(encoded, 0);
/// assert_eq!(input, decoded)
/// ```