data-encoding 1.1.2

This crate provides generic data encoding functions. It is meant to guarantee mathematical properties, to conform to RFC 4648, to be efficient, and to give choice between allocating and in-place functions. It also provides an exhaustive example with similar functionality to the base64 GNU program. It supports common bases (base2, base4, base8, base16, hex, base32, base32hex, base64, and base64url) and custom bases (defined on the command-line by their padding and symbols in value order).
Documentation

This crate provides generic data encoding functions.

Encoding and decoding functions with and without allocation are provided for common bases. Those functions are instantiated from generic functions using a base interface described in module base. The generic encoding and decoding functions are defined in the encode and decode modules respectively.

Examples

use data_encoding::hex;
use data_encoding::base64;
assert_eq!(hex::encode(b"some raw data"), "736F6D65207261772064617461");
assert_eq!(base64::decode(b"c29tZSByYXcgZGF0YQ==").unwrap(), b"some raw data");

A more involved example is available in the examples directory. It is similar to the base64 GNU program, but it works for all common bases and also for custom bases defined at runtime. The make encode command builds this example in target/release/examples/encode.

Conformance

This crate is meant to be conform. The base16, hex, base32, base32hex, base64, and base64url modules conform to RFC 4648.

Properties

This crate is meant to provide strong properties. The encoding and decoding functions satisfy the following properties:

  • They are deterministic: their output only depends on their input.
  • They have no side-effects: they do not modify a hidden mutable state.
  • They never panic, although the decoding function may return a decoding error on invalid input.
  • They are inverse of each other:
    • For all data: Vec<u8>, we have decode(encode(&data).as_bytes()) == Ok(data).
    • For all repr: String, if there is data: Vec<u8> such that decode(repr.as_bytes()) == Ok(data), then encode(&data) == repr.

This last property, that encode and decode are inverse of each other, is usually not satisfied by common base64 implementations, like the rustc-serialize crate or the base64 GNU program. This is a matter of choice, and this crate has made the choice to guarantee canonical encoding as described by section 3.5 of the RFC.

Since the RFC specifies encode on all inputs and decode on all possible encode outputs, the differences between implementations come from the decode function which may be more or less permissive. In this crate, the decode function rejects all inputs that are not a possible output of the encode function. A pre-treatment of the input has to be done to be more permissive (see the example of the examples directory). Here are some concrete examples of decoding differences between this crate, the rustc-serialize crate, and the base64 GNU program:

Input data-encoding rustc-serialize GNU base64
AAB= Err(BadPadding) Ok(vec![0, 0]) \x00\x00
AA\nB= Err(BadLength) Ok(vec![0, 0]) \x00\x00
AAB Err(BadLength) Ok(vec![0, 0]) Invalid input
A\rA\nB= Err(BadLength) Ok(vec![0, 0]) Invalid input
-_\r\n Err(BadCharacter(0)) Ok(vec![251]) Invalid input

We can summarize these discrepancies as follows:

Discrepancy data-encoding rustc-serialize GNU base64
Non-significant bits before padding may be non-null No Yes Yes
Non-alphabet ignored characters None \r and \n \n
Non-alphabet translated characters None -_ mapped to +/ None
Padding is optional No Yes No

This crate may provide wrappers to accept these discrepancies in a generic way at some point in the future.

Performance

This crate is meant to be efficient. It has comparable performance to the rustc-serialize crate and the base64 GNU program. The make bench command runs some benchmarks using cargo and a shell script.