Crate zalgo_codec

source
Expand description

This crate lets you convert an ASCII text string into a single unicode grapheme cluster and back. It also provides a procedural macro that lets you embed such a grapheme cluster and decode it into source code at compile time.
This lets you reach new lows in the field of self-documenting code.

The encoded string will be ~2 times larger than the original in terms of bytes.

Additionally the crate provides a function to encode Python code and wrap the result in a decoder that decodes and executes the encoded string, retaining the functionality of the original code.

There are two ways of interacting with the codec. The first one is to call the encoding and decoding functions directly, and the second one is to use the ZalgoString wrapper type.

§Examples

Encode a string to a grapheme cluster with zalgo_encode:

let s = "Zalgo";
let encoded = zalgo_encode(s)?;
assert_eq!(encoded, "É̺͇͌͏");

Decode the grapheme cluster back into a string with zalgo_decode:

let encoded = "É̺͇͌͏";
let s = zalgo_decode(encoded)?;
assert_eq!(s, "Zalgo");

The ZalgoString type can be used to encode a string and handle the result in various ways:

let s = "Zalgo";
let zstr = ZalgoString::new(s)?;
assert_eq!(zstr, "É̺͇͌͏");
assert_eq!(zstr.len(), 2 * s.len() + 1);
assert_eq!(zstr.decoded_len(), s.len());
assert_eq!(zstr.bytes().next(), Some(69));
assert_eq!(zstr.decoded_chars().next_back(), Some('o'));

Encode Rust source code and embed it in your program with the zalgo_embed! proc-macro:

// This grapheme cluster was made by encoding "add(x: i32, y: i32) -> i32 {x + y}"
zalgo_embed!("E͎͉͙͉̞͉͙͆̀́̈́̈́̈̀̓̒̌̀̀̓̒̉̀̍̀̓̒̀͛̀̋̀͘̚̚͘͝");

// The `add` function is now available
assert_eq!(add(10, 20), 30);

§Feature flags

std (enabled by default): enables Error to capture a Backtrace. If this feature is not enabled the library is no_std compatible, but still uses the alloc crate.

serde: derives the Serialize and Deserialize traits from serde for ZalgoString.

rkyv: derives the Serialize, Deserialize, and Archive traits from rkyv for ZalgoString.

macro (enabled by default): exports the procedural macros zalgo_embed! and zalgofy!.

§Explanation

Characters U+0300–U+036F are the combining characters for unicode Latin. The fun thing about combining characters is that you can add as many of these characters as you like to the original character and it does not create any new symbols, it only adds symbols on top of the character. It’s supposed to be used in order to create characters such as by taking a normal a and adding another character to give it the mark (U+301, in this case). Fun fact: Unicode doesn’t specify any limit on the number of these characters. Conveniently, this gives us 112 different characters we can map to, which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters. The only issue is that we can’t have new lines in this system, so to fix that, we can simply map 0x7F (DEL) to 0x0A (LF). This can be represented as (CHARACTER - 11) % 133 - 21, and decoded with (CHARACTER + 22) % 133 + 10.

Full conversion table
ASCII characterEncoded
AU+321
BU+322
CU+323
DU+324
EU+325
FU+326
GU+327
HU+328
IU+329
JU+32A
KU+32B
LU+32C
MU+32D
NU+32E
OU+32F
PU+330
QU+331
RU+332
SU+333
TU+334
UU+335
VU+336
WU+337
XU+338
YU+339
ZU+33A
aU+341
bU+342
cU+343
dU+344
eU+345
fU+346
gU+347
hU+348
iU+349
jU+34A
kU+34B
lU+34C
mU+34D
nU+34E
oU+34F
pU+350
qU+351
rU+352
sU+353
tU+354
uU+355
vU+356
wU+357
xU+358
yU+359
zU+35A
1U+311
2U+312
3U+313
4U+314
5U+315
6U+316
7U+317
8U+318
9U+319
0U+310
U+300
!U+301
U+302
#U+303
$U+304
%U+305
&U+306
U+307
(U+308
)U+309
*U+30A
+U+30B
,U+30C
-U+30D
\U+33C
.U+30E
/U+30F
:U+31A
;U+31B
<U+31C
=U+31D
>U+31E
?U+31F
@U+320
\nU+36F

§Experiment with the codec

There is an executable available for experimenting with the codec on text and files. It can also be used to generate grapheme clusters from source code for use with zalgo_embed!. It can be installed with cargo install zalgo-codec --features binary. You can optionally enable the gui feature during installation to include a rudimentary GUI mode for the program.

Modules§

Macros§

  • This macro decodes a string that has been encoded with zalgo_encode and passes the results on to the compiler.
  • zalgofymacro
    At compile time this proc-macro encodes the given string literal as a single grapheme cluster.

Structs§

Functions§

  • Takes in a string that was encoded by zalgo_encode and decodes it back into an ASCII string.
  • Takes in a string slice that consists of only printable ACII and newline characters and encodes it into a single grapheme cluster using a reversible encoding scheme.
  • zalgo-encodes an ASCII string containing Python code and wraps it in a decoder that decodes and executes it. The resulting Python code should retain the functionality of the original.