Expand description
Tick Encoding is a simple encoding scheme that encodes arbitrary binary data into an ASCII string, implemented as a Rust crate. It’s primarily designed for stuffing usually-ASCII data into JSON strings. It’s very similar to percent encoding / URL encoding, but with a few key differences:
- Uses backtick (`) instead of percent (
%
) as the escape character - One canonical encoding for any binary data
- One consistent set of characters that require escaping
- Less characters need escaping
§Usage
Install the tick-encoding
crate as a Rust dependency by running cargo add tick-encoding
.
// Encode the input into a tick-encoded ASCII string
let encoded = tick_encoding::encode("hello, world! 🙂".as_bytes());
assert_eq!(encoded, "hello, world! `F0`9F`99`82");
// Decode it back into a UTF-8 string
let decoded = tick_encoding::decode(encoded.as_bytes()).unwrap();
let decoded_str = std::str::from_utf8(&decoded).unwrap();
assert_eq!(decoded_str, "hello, world! 🙂");
§Cargo features
The tick-encoding
crate includes the following Cargo features:
std
(default): Enables functionality using Rust’s standard library. Disable to build in#![no_std]
mode.alloc
(default): Enables functionality that depends on the global allocator. Disabling this will greatly limit what functions you can use!safe
: Avoid unsafe code. By default, a small amount of unsafe code is used (all checked with extensive unit tests, property tests, and Miri checks). Enabling this feature switches to purely safe code, and enables the#![deny(unsafe_code)]
lint at the crate level.
§Encoding scheme
The encoding scheme for Tick Encoding is straightforward:
- All printable ASCII bytes except backtick (`) are encoded as-is (
0x21
to0x5F
, and0x61
to0x7E
) - ASCII tabs, newlines, carriage returns, and space characters are also encoded as-is (
0x09
,0x0A
,0x0D
, and0x20
) - Backtick (`) is encoded as two backticks (
0x60
becomes0x60 0x60
) - All other bytes are encoded as backtick followed by two uppercase hexadecimal characters
Decoding just reverses the process. To ensure that decoding and re-encoding produces the same output string, the encoded string is validated while decoding:
- The encoded string can only contain printable ASCII characters, tabs, newlines, carriage returns, and spaces
- A backtick must be followed by a backtick or two uppercase hexadecimal characters
Modules§
Structs§
- Escaped
Hex - A two-digit escaped hex sequence, prefixed with a backtick.
Enums§
- Decode
Error - An error trying to decode a tick-encoded string.
Functions§
- decode
- Decode the given encoded input into a byte array. If no bytes need to be un-escapeed, then the result will be borrowed from the input.
- decode_
in_ place - Take a byte slice containing a tick-encoded ASCII string, and decode it in-place, writing back into the same byte slice. Returns a sub-slice containing just the decoded bytes (the bytes past the returned sub-slice are left unchanged).
- decode_
iter - Return an iterator that decodes the tick-encoded characters from the input
iterator. Returns
Some(Err(_))
if the input character sequence is invalid, then returnsNone
after that. - decode_
to_ vec - Decode the given tick-encoded ASCII input, and append the result to
output
. Returns the number of bytes appended. Returns an error if the result isn’t a valid ASCII string, or isn’t a valid canonical tick-encoding. - encode
- Encode the given input as a string, escaping any bytes that require it. If no bytes require escaping, then the result will be borrowed from the input.
- encode_
iter - Return an iterator that encodes the bytes from the input iterator.
- encode_
to_ string - Encode the given input, and append the result to
output
. Returns the number of bytes / characters appended (only ASCII characters are appended). - encode_
to_ vec - Encode the given input, and append the result to
output
. Returns the number of bytes appended. - requires_
escape - Returns true if the given byte must be escaped with a backtick.