Crate zalgo_codec

source ·
Expand description

This crate lets you convert an ASCII text string into a single unicode grapheme cluster and back. It also provides a procedural macro that lets you embed such a grapheme cluster and decode it into source code at compile time.
This lets you reach new lows in the field of self-documenting code.

The encoded string will be ~2 times larger than the original in terms of bytes.

Additionally the crate provides a function to encode Python code and wrap the result in a decoder that decodes and executes the encoded string, retaining the functionality of the original code.

There are two ways of interacting with the codec. The first one is to call the encoding and decoding functions directly, and the second one is to use the ZalgoString wrapper type.

Examples

Encode a string to a grapheme cluster with zalgo_encode:

let s = "Zalgo";
let encoded = zalgo_encode(s)?;
assert_eq!(encoded, "É̺͇͌͏");

Decode the grapheme cluster back into a string with zalgo_decode:

let encoded = "É̺͇͌͏";
let s = zalgo_decode(encoded)?;
assert_eq!(s, "Zalgo");

The ZalgoString type can be used to encode a string and handle the result in various ways:

let s = "Zalgo";
let zstr = ZalgoString::new(s)?;
assert_eq!(zstr, "É̺͇͌͏");
assert_eq!(zstr.len(), 2 * s.len() + 1);
assert_eq!(zstr.decoded_len(), s.len());
assert_eq!(zstr.bytes().next(), Some(69));
assert_eq!(zstr.decoded_chars().next_back(), Some('o'));

Encode Rust source code and embed it in your program with the zalgo_embed! proc-macro:

// This grapheme cluster was made by encoding "add(x: i32, y: i32) -> i32 {x + y}"
zalgo_embed!("E͎͉͙͉̞͉͙͆̀́̈́̈́̈̀̓̒̌̀̀̓̒̉̀̍̀̓̒̀͛̀̋̀͘̚̚͘͝");

// The `add` function is now available
assert_eq!(add(10, 20), 30);

Features

std (enabled by default): links the standard library and uses it to implement the std::error::Error trait for the provided Error type. If this feature is not enabled the library is #[no_std], but still uses the alloc crate.

serde: implements the Serialize and Deserialize traits from serde for ZalgoString.

macro (enabled by default): exports the procedural macros zalgo_embed! and zalgofy!.

Explanation

Characters U+0300–U+036F are the combining characters for unicode Latin. The fun thing about combining characters is that you can add as many of these characters as you like to the original character and it does not create any new symbols, it only adds symbols on top of the character. It’s supposed to be used in order to create characters such as by taking a normal a and adding another character to give it the mark (U+301, in this case). Fun fact: Unicode doesn’t specify any limit on the number of these characters. Conveniently, this gives us 112 different characters we can map to, which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters. The only issue is that we can’t have new lines in this system, so to fix that, we can simply map 0x7F (DEL) to 0x0A (LF). This can be represented as (CHARACTER - 11) % 133 - 21, and decoded with (CHARACTER + 22) % 133 + 10.

Experiment with the codec

There is an executable available for experimenting with the codec on text and files. It can also be used to generate grapheme clusters from source code for use with zalgo_embed!. It can be installed with cargo install zalgo-codec --features binary. You can optionally enable the gui feature during installation to include a rudimentary GUI mode for the program.

Modules

Macros

  • This macro decodes a string that has been encoded with zalgo_encode and passes the results on to the compiler.
  • At compile time this proc-macro encodes the given string literal as a single grapheme cluster.

Structs

  • A String that has been encoded with zalgo_encode. This struct can be decoded in-place and also allows iteration over its characters and bytes, both in decoded and encoded form.

Enums

Functions

  • Takes in a string that was encoded by zalgo_encode and decodes it back into an ASCII string.
  • Takes in a string slice that consists of only printable ACII and newline characters and encodes it into a single grapheme cluster using a reversible encoding scheme.
  • zalgo-encodes an ASCII string containing Python code and wraps it in a decoder that decodes and executes it. The resulting Python code should retain the functionality of the original.