Crate zalgo_codec
source ·Expand description
This crate lets you convert ASCII strings into single unicode grapheme clusters and back.
It is based on the encoding and decoding functions
originally written in Python by Scott Conner
and extends them for Rust by providing a procedural macro that lets you embed an encoded string
and decode it into source code at compile time.
This lets you reach new lows in the field of self-documenting code.
The encoded string will be ~2 times larger than the original in terms of bytes,
but if you count the number of grapheme clusters it contains (with e.g. UnicodeSegmentation::graphemes
)
you should only get one.
A small program that lets you use the functions in the crate on text and files is included in the source repository and can be installed with
cargo install zalgo-codec --features binary
.
Additionally the crate provides a function to encode Python code and wrap the result in a decoder that decodes and executes the encoded string, retaining the functionality of the original code.
There are two ways of interacting with the codec.
The first one is to call the encoding and decoding functions directly,
and the second one is to use the ZalgoString
wrapper type.
Example
The cursed character is the result of using zalgo_encode
on the text fn add(x: i32, y: i32) -> i32 {x + y}
.
// We can add that text to our code with the macro
zalgo_embed!("E͎͉͙͉̞͉͙͆̀́̈́̈́̈̀̓̒̌̀̀̓̒̉̀̍̀̓̒̀͛̀̋̀͘̚̚͘͝");
// The `add` function is now available
assert_eq!(add(10, 20), 30);
Explanation
Characters U+0300–U+036F are the combining characters for unicode Latin. The fun thing about combining characters is that you can add as many of these characters as you like to the original character and it does not create any new symbols, it only adds symbols on top of the character. It’s supposed to be used in order to create characters such as á by taking a normal a and adding another character to give it the mark (U+301, in this case). Fun fact, Unicode doesn’t specify any limit on the number of these characters. Conveniently, this gives us 112 different characters we can map to, which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters. The only issue is that we can’t have new lines in this system, so to fix that, we can simply map 0x7F (DEL) to 0x0A (LF). This can be represented as (CHARACTER - 11) % 133 - 21, and decoded with (CHARACTER + 22) % 133 + 10.
Modules
- Contains the implementation of
ZalgoString
as well as related iterators.
Macros
- This macro decodes a string that has been encoded with
zalgo_encode
and passes the results on to the compiler.
Structs
- A thin wrapper around a
String
that has been encoded withzalgo_encode
. This struct can be decoded in-place and also allows iteration over its characters and bytes, both in decoded and encoded form.
Enums
- The error returned by
zalgo_encode
,ZalgoString::new
, andzalgo_wrap_python
if they encounter a byte they can not encode.
Functions
- Takes in a string that was encoded by
zalgo_encode
and decodes it back into an ASCII string. - Takes in an ASCII string without control characters (except newlines) and encodes it into a single grapheme cluster using a reversible encoding scheme.
- zalgo-encodes an ASCII string containing Python code and wraps it in a decoder that decodes and executes it. The resulting Python code should retain the functionality of the original.