Crate tick_encoding

Source
Expand description

Crates.io docs.rs Minimum Supported Rust Version Tests

Tick Encoding is a simple encoding scheme that encodes arbitrary binary data into an ASCII string, implemented as a Rust crate. It’s primarily designed for stuffing usually-ASCII data into JSON strings. It’s very similar to percent encoding / URL encoding, but with a few key differences:

  • Uses backtick (`) instead of percent (%) as the escape character
  • One canonical encoding for any binary data
  • One consistent set of characters that require escaping
  • Less characters need escaping

§Usage

Install the tick-encoding crate as a Rust dependency by running cargo add tick-encoding.

// Encode the input into a tick-encoded ASCII string
let encoded = tick_encoding::encode("hello, world! 🙂".as_bytes());
assert_eq!(encoded, "hello, world! `F0`9F`99`82");

// Decode it back into a UTF-8 string
let decoded = tick_encoding::decode(encoded.as_bytes()).unwrap();
let decoded_str = std::str::from_utf8(&decoded).unwrap();
assert_eq!(decoded_str, "hello, world! 🙂");

§Cargo features

The tick-encoding crate includes the following Cargo features:

  • std (default): Enables functionality using Rust’s standard library. Disable to build in #![no_std] mode.
  • alloc (default): Enables functionality that depends on the global allocator. Disabling this will greatly limit what functions you can use!
  • safe: Avoid unsafe code. By default, a small amount of unsafe code is used (all checked with extensive unit tests, property tests, and Miri checks). Enabling this feature switches to purely safe code, and enables the #![deny(unsafe_code)] lint at the crate level.

§Encoding scheme

The encoding scheme for Tick Encoding is straightforward:

  • All printable ASCII bytes except backtick (`) are encoded as-is (0x21 to 0x5F, and 0x61 to 0x7E)
  • ASCII tabs, newlines, carriage returns, and space characters are also encoded as-is (0x09, 0x0A, 0x0D, and 0x20)
  • Backtick (`) is encoded as two backticks (0x60 becomes 0x60 0x60)
  • All other bytes are encoded as backtick followed by two uppercase hexadecimal characters

Decoding just reverses the process. To ensure that decoding and re-encoding produces the same output string, the encoded string is validated while decoding:

  • The encoded string can only contain printable ASCII characters, tabs, newlines, carriage returns, and spaces
  • A backtick must be followed by a backtick or two uppercase hexadecimal characters

Modules§

iter

Structs§

EscapedHex
A two-digit escaped hex sequence, prefixed with a backtick.

Enums§

DecodeError
An error trying to decode a tick-encoded string.

Functions§

decode
Decode the given encoded input into a byte array. If no bytes need to be un-escapeed, then the result will be borrowed from the input.
decode_in_place
Take a byte slice containing a tick-encoded ASCII string, and decode it in-place, writing back into the same byte slice. Returns a sub-slice containing just the decoded bytes (the bytes past the returned sub-slice are left unchanged).
decode_iter
Return an iterator that decodes the tick-encoded characters from the input iterator. Returns Some(Err(_)) if the input character sequence is invalid, then returns None after that.
decode_to_vec
Decode the given tick-encoded ASCII input, and append the result to output. Returns the number of bytes appended. Returns an error if the result isn’t a valid ASCII string, or isn’t a valid canonical tick-encoding.
encode
Encode the given input as a string, escaping any bytes that require it. If no bytes require escaping, then the result will be borrowed from the input.
encode_iter
Return an iterator that encodes the bytes from the input iterator.
encode_to_string
Encode the given input, and append the result to output. Returns the number of bytes / characters appended (only ASCII characters are appended).
encode_to_vec
Encode the given input, and append the result to output. Returns the number of bytes appended.
requires_escape
Returns true if the given byte must be escaped with a backtick.