Expand description
A crate for converting a string containing only printable ASCII and newlines
into a single unicode grapheme cluster and back.
Provides the non-macro functionality of the crate zalgo-codec
.
There are two ways of interacting with the codec.
The first is to call the encoding and decoding functions directly,
and the second is to use the ZalgoString
wrapper type.
§Examples
Encode a string to a grapheme cluster with zalgo_encode
:
let s = "Zalgo";
let encoded = zalgo_encode(s)?;
assert_eq!(encoded, "É̺͇͌͏");
Decode a grapheme cluster back into a string:
let encoded = "É̺͇͌͏";
let s = zalgo_decode(encoded)?;
assert_eq!(s, "Zalgo");
The ZalgoString
type can be used to encode a string and handle the result in various ways:
let s = "Zalgo";
let zstr = ZalgoString::new(s)?;
// Implements PartialEq with common string types
assert_eq!(zstr, "É̺͇͌͏");
// Utility functions
assert_eq!(zstr.len(), 2 * s.len() + 1);
assert_eq!(zstr.decoded_len(), s.len());
// Iterate over bytes and chars, in both encoded and decoded form
assert_eq!(zstr.bytes().next(), Some(69));
assert_eq!(zstr.decoded_bytes().nth_back(2), Some(b'l'));
assert_eq!(zstr.chars().nth(1), Some('\u{33a}'));
assert_eq!(zstr.decoded_chars().next_back(), Some('o'));
// Decode inplace
assert_eq!(zstr.into_decoded_string(), "Zalgo");
§Feature flags
std
: enables EncodeError
and DecodeError
to capture a Backtrace
.
If this feature is not enabled the library is no_std
compatible, but still uses the alloc
crate.
serde
: derives the serde::Serialize
and serde::Deserialize
traits
from serde
for ZalgoString
.
rkyv
: derives the rkyv::Serialize
, rkyv::Deserialize
, and rkyv::Archive
traits from rkyv
for ZalgoString
.
§Explanation
Characters U+0300–U+036F are the combining characters for unicode Latin.
The fun thing about combining characters is that you can add as many of these characters
as you like to the original character and it does not create any new symbols,
it only adds symbols on top of the character. It’s supposed to be used in order to
create characters such as á
by taking a normal a
and adding another character
to give it the mark (U+301, in this case). Fun fact: Unicode doesn’t specify
any limit on the number of these characters.
Conveniently, this gives us 112 different characters we can map to,
which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters.
The only issue is that we can’t have new lines in this system, so to fix that,
we can simply map 0x7F (DEL) to 0x0A (LF).
This can be represented as (CHARACTER - 11) % 133 - 21
, and decoded with (CHARACTER + 22) % 133 + 10
.
Full conversion table
ASCII character | Encoded |
---|---|
A | U+321 |
B | U+322 |
C | U+323 |
D | U+324 |
E | U+325 |
F | U+326 |
G | U+327 |
H | U+328 |
I | U+329 |
J | U+32A |
K | U+32B |
L | U+32C |
M | U+32D |
N | U+32E |
O | U+32F |
P | U+330 |
Q | U+331 |
R | U+332 |
S | U+333 |
T | U+334 |
U | U+335 |
V | U+336 |
W | U+337 |
X | U+338 |
Y | U+339 |
Z | U+33A |
a | U+341 |
b | U+342 |
c | U+343 |
d | U+344 |
e | U+345 |
f | U+346 |
g | U+347 |
h | U+348 |
i | U+349 |
j | U+34A |
k | U+34B |
l | U+34C |
m | U+34D |
n | U+34E |
o | U+34F |
p | U+350 |
q | U+351 |
r | U+352 |
s | U+353 |
t | U+354 |
u | U+355 |
v | U+356 |
w | U+357 |
x | U+358 |
y | U+359 |
z | U+35A |
1 | U+311 |
2 | U+312 |
3 | U+313 |
4 | U+314 |
5 | U+315 |
6 | U+316 |
7 | U+317 |
8 | U+318 |
9 | U+319 |
0 | U+310 |
U+300 | |
! | U+301 |
“ | U+302 |
# | U+303 |
$ | U+304 |
% | U+305 |
& | U+306 |
’ | U+307 |
( | U+308 |
) | U+309 |
* | U+30A |
+ | U+30B |
, | U+30C |
- | U+30D |
\ | U+33C |
. | U+30E |
/ | U+30F |
: | U+31A |
; | U+31B |
< | U+31C |
= | U+31D |
> | U+31E |
? | U+31F |
@ | U+320 |
\n | U+36F |
§Experiment with the codec
There is an executable available for experimenting with the codec on text and files.
It can be installed with cargo install zalgo-codec --features binary
.
You can optionally enable the gui
feature during installation to include a rudimentary GUI mode for the program.
Re-exports§
pub use zalgo_string::ZalgoString;
Modules§
- Contains the implementation of
ZalgoString
as well as related iterators.
Structs§
- The error returned by
zalgo_decode
if a string can not be decoded. - The error returned by
zalgo_encode
,ZalgoString::new
, andzalgo_wrap_python
if they encounter a byte they can not encode.
Functions§
- Takes in a string that was encoded by
zalgo_encode
and decodes it back into an ASCII string. - Takes in a string slice that consists of only printable ACII and newline characters and encodes it into a single grapheme cluster using a reversible encoding scheme.
- zalgo-encodes an ASCII string containing Python code and wraps it in a decoder that decodes and executes it. The resulting Python code should retain the functionality of the original.