Crate zalgo_codec_common

Source
Expand description

A crate for converting a string containing only printable ASCII and newlines into a single unicode grapheme cluster and back. Provides the non-macro functionality of the crate zalgo-codec.

There are two ways of interacting with the codec. The first is to call the encoding and decoding functions directly, and the second is to use the ZalgoString wrapper type.

§Examples

Encode a string to a grapheme cluster with zalgo_encode:

let s = "Zalgo";
let encoded = zalgo_encode(s)?;
assert_eq!(encoded, "É̺͇͌͏");

Decode a grapheme cluster back into a string:

let encoded = "É̺͇͌͏";
let s = zalgo_decode(encoded)?;
assert_eq!(s, "Zalgo");

The ZalgoString type can be used to encode a string and handle the result in various ways:

let s = "Zalgo";
let zstr = ZalgoString::new(s)?;

// Implements PartialEq with common string types
assert_eq!(zstr, "É̺͇͌͏");

// Utility functions
assert_eq!(zstr.len(), 2 * s.len() + 1);
assert_eq!(zstr.decoded_len(), s.len());

// Iterate over bytes and chars, in both encoded and decoded form
assert_eq!(zstr.bytes().next(), Some(69));
assert_eq!(zstr.decoded_bytes().nth_back(2), Some(b'l'));
assert_eq!(zstr.chars().nth(1), Some('\u{33a}'));
assert_eq!(zstr.decoded_chars().next_back(), Some('o'));

// Decode inplace
assert_eq!(zstr.into_decoded_string(), "Zalgo");

§Feature flags

std: enables EncodeError and DecodeError to capture a Backtrace. If this feature is not enabled the library is no_std compatible, but still uses the alloc crate.

serde: derives the serde::Serialize and serde::Deserialize traits from serde for ZalgoString.

rkyv: derives the rkyv::Serialize, rkyv::Deserialize, and rkyv::Archive traits from rkyv for ZalgoString.

§Explanation

Characters U+0300–U+036F are the combining characters for unicode Latin. The fun thing about combining characters is that you can add as many of these characters as you like to the original character and it does not create any new symbols, it only adds symbols on top of the character. It’s supposed to be used in order to create characters such as by taking a normal a and adding another character to give it the mark (U+301, in this case). Fun fact: Unicode doesn’t specify any limit on the number of these characters. Conveniently, this gives us 112 different characters we can map to, which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters. The only issue is that we can’t have new lines in this system, so to fix that, we can simply map 0x7F (DEL) to 0x0A (LF). This can be represented as (CHARACTER - 11) % 133 - 21, and decoded with (CHARACTER + 22) % 133 + 10.

Full conversion table
ASCII characterEncoded
AU+321
BU+322
CU+323
DU+324
EU+325
FU+326
GU+327
HU+328
IU+329
JU+32A
KU+32B
LU+32C
MU+32D
NU+32E
OU+32F
PU+330
QU+331
RU+332
SU+333
TU+334
UU+335
VU+336
WU+337
XU+338
YU+339
ZU+33A
aU+341
bU+342
cU+343
dU+344
eU+345
fU+346
gU+347
hU+348
iU+349
jU+34A
kU+34B
lU+34C
mU+34D
nU+34E
oU+34F
pU+350
qU+351
rU+352
sU+353
tU+354
uU+355
vU+356
wU+357
xU+358
yU+359
zU+35A
1U+311
2U+312
3U+313
4U+314
5U+315
6U+316
7U+317
8U+318
9U+319
0U+310
U+300
!U+301
U+302
#U+303
$U+304
%U+305
&U+306
U+307
(U+308
)U+309
*U+30A
+U+30B
,U+30C
-U+30D
\U+33C
.U+30E
/U+30F
:U+31A
;U+31B
<U+31C
=U+31D
>U+31E
?U+31F
@U+320
\nU+36F

§Experiment with the codec

There is an executable available for experimenting with the codec on text and files. It can be installed with cargo install zalgo-codec --features binary. You can optionally enable the gui feature during installation to include a rudimentary GUI mode for the program.

Re-exports§

Modules§

Structs§

Functions§

  • Takes in a string that was encoded by zalgo_encode and decodes it back into an ASCII string.
  • Takes in a string slice that consists of only printable ACII and newline characters and encodes it into a single grapheme cluster using a reversible encoding scheme.
  • zalgo-encodes an ASCII string containing Python code and wraps it in a decoder that decodes and executes it. The resulting Python code should retain the functionality of the original.