Expand description
This crate lets you convert an ASCII text string into a single unicode grapheme cluster and back.
It also provides a procedural macro that lets you embed such a grapheme cluster and decode it into source code at compile time.
This lets you reach new lows in the field of self-documenting code.
The encoded string will be ~2 times larger than the original in terms of bytes.
Additionally the crate provides a function to encode Python code and wrap the result in a decoder that decodes and executes the encoded string, retaining the functionality of the original code.
There are two ways of interacting with the codec.
The first one is to call the encoding and decoding functions directly,
and the second one is to use the ZalgoString
wrapper type.
§Examples
Encode a string to a grapheme cluster with zalgo_encode
:
let s = "Zalgo";
let encoded = zalgo_encode(s)?;
assert_eq!(encoded, "É̺͇͌͏");
Decode the grapheme cluster back into a string with zalgo_decode
:
let encoded = "É̺͇͌͏";
let s = zalgo_decode(encoded)?;
assert_eq!(s, "Zalgo");
The ZalgoString
type can be used to encode a string and handle the result in various ways:
let s = "Zalgo";
let zstr = ZalgoString::new(s)?;
assert_eq!(zstr, "É̺͇͌͏");
assert_eq!(zstr.len(), 2 * s.len() + 1);
assert_eq!(zstr.decoded_len(), s.len());
assert_eq!(zstr.bytes().next(), Some(69));
assert_eq!(zstr.decoded_chars().next_back(), Some('o'));
Encode Rust source code and embed it in your program with the zalgo_embed!
proc-macro:
// This grapheme cluster was made by encoding "add(x: i32, y: i32) -> i32 {x + y}"
zalgo_embed!("E͎͉͙͉̞͉͙͆̀́̈́̈́̈̀̓̒̌̀̀̓̒̉̀̍̀̓̒̀͛̀̋̀͘̚̚͘͝");
// The `add` function is now available
assert_eq!(add(10, 20), 30);
§Feature flags
std
: enables EncodeError
and DecodeError
to capture a Backtrace
.
If this feature is not enabled the library is no_std
compatible, but still uses the alloc
crate.
serde
: derives the Serialize
and Deserialize
traits from serde
for ZalgoString
.
rkyv
: derives the Serialize
, Deserialize
, and Archive
traits from rkyv
for ZalgoString
.
macro
(enabled by default): exports the procedural macros zalgo_embed!
and zalgofy!
.
§Explanation
Characters U+0300–U+036F are the combining characters for unicode Latin.
The fun thing about combining characters is that you can add as many of these characters
as you like to the original character and it does not create any new symbols,
it only adds symbols on top of the character. It’s supposed to be used in order to
create characters such as á
by taking a normal a
and adding another character
to give it the mark (U+301, in this case). Fun fact: Unicode doesn’t specify
any limit on the number of these characters.
Conveniently, this gives us 112 different characters we can map to,
which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters.
The only issue is that we can’t have new lines in this system, so to fix that,
we can simply map 0x7F (DEL) to 0x0A (LF).
This can be represented as (CHARACTER - 11) % 133 - 21
, and decoded with (CHARACTER + 22) % 133 + 10
.
Full conversion table
ASCII character | Encoded |
---|---|
A | U+321 |
B | U+322 |
C | U+323 |
D | U+324 |
E | U+325 |
F | U+326 |
G | U+327 |
H | U+328 |
I | U+329 |
J | U+32A |
K | U+32B |
L | U+32C |
M | U+32D |
N | U+32E |
O | U+32F |
P | U+330 |
Q | U+331 |
R | U+332 |
S | U+333 |
T | U+334 |
U | U+335 |
V | U+336 |
W | U+337 |
X | U+338 |
Y | U+339 |
Z | U+33A |
a | U+341 |
b | U+342 |
c | U+343 |
d | U+344 |
e | U+345 |
f | U+346 |
g | U+347 |
h | U+348 |
i | U+349 |
j | U+34A |
k | U+34B |
l | U+34C |
m | U+34D |
n | U+34E |
o | U+34F |
p | U+350 |
q | U+351 |
r | U+352 |
s | U+353 |
t | U+354 |
u | U+355 |
v | U+356 |
w | U+357 |
x | U+358 |
y | U+359 |
z | U+35A |
1 | U+311 |
2 | U+312 |
3 | U+313 |
4 | U+314 |
5 | U+315 |
6 | U+316 |
7 | U+317 |
8 | U+318 |
9 | U+319 |
0 | U+310 |
U+300 | |
! | U+301 |
“ | U+302 |
# | U+303 |
$ | U+304 |
% | U+305 |
& | U+306 |
’ | U+307 |
( | U+308 |
) | U+309 |
* | U+30A |
+ | U+30B |
, | U+30C |
- | U+30D |
\ | U+33C |
. | U+30E |
/ | U+30F |
: | U+31A |
; | U+31B |
< | U+31C |
= | U+31D |
> | U+31E |
? | U+31F |
@ | U+320 |
\n | U+36F |
§Experiment with the codec
There is an executable available for experimenting with the codec on text and files.
It can also be used to generate grapheme clusters from source code for use with zalgo_embed!
.
It can be installed with cargo install zalgo-codec --features binary
.
You can optionally enable the gui
feature during installation to include a rudimentary GUI mode for the program.
Modules§
- Contains the implementation of
ZalgoString
as well as related iterators.
Macros§
- zalgo_
embed macro
This macro decodes a string that has been encoded withzalgo_encode
and passes the results on to the compiler. - zalgofy
macro
At compile time this proc-macro encodes the given string literal as a single grapheme cluster.
Structs§
- The error returned by
zalgo_decode
if a string can not be decoded. - The error returned by
zalgo_encode
,ZalgoString::new
, andzalgo_wrap_python
if they encounter a byte they can not encode. - A
String
that has been encoded withzalgo_encode
. This struct can be decoded in-place and also allows iteration over its characters and bytes, both in decoded and encoded form.
Functions§
- Takes in a string that was encoded by
zalgo_encode
and decodes it back into an ASCII string. - Takes in a string slice that consists of only printable ACII and newline characters and encodes it into a single grapheme cluster using a reversible encoding scheme.
- zalgo-encodes an ASCII string containing Python code and wraps it in a decoder that decodes and executes it. The resulting Python code should retain the functionality of the original.