Crate deencode

Source
Expand description

§Deencode: Reverse engineer encoding errors

The goal of this crate is to automatically explore the result of successively encoding then decoding a string using different encoding schemes, which usually results in some corruption of the non-ASCII characters.

§Concepts

  • Engines are objects that represent an encoding scheme, and can be used to encode (String to bytes) or decode (bytes to String). A number of engines are already implemented into this crate, with static instances if you want to use them.
  • The structure of deencoding is a tree: from an input string, every engine may give an encoding, then every engine gives a decoding of that encoding, and so on.

Note: The deencoding process is not optimised to avoid doing the same steps over and over. It is recommended to keep the depth to small numbers. Deduplication can then be applied to remove duplication in the tree.

§Usage

use deencode::*;

// List the engines to use.
let engines: Vec<&dyn Engine> = vec![&UTF8, &LATIN1, &MIXED816BE, &MIXED816LE, &UTF7];
// Explore the tree of possible encodings and decodings.
let mut tree = deencode("Clément", &engines, 1);
// Remove duplicate entries from the tree.
let _ = tree.deduplicate();

// Export the tree with box drawings.
println!("{}", tree);
// Export the tree as JSON.
println!("{}", serde_json::to_string(&tree).unwrap());

Re-exports§

pub use engine::Engine;
pub use deencodetree::DeencodeTree;

Modules§

deencodetree
The deencoding process.
engine
latin1engine
Deencoding engine for Latin-1 / Codepage 1252
mixed816beengine
Deencoding engine for a mixed UTF-8/UTF-16BE scheme
mixed816leengine
Deencoding engine for a mixed UTF-8/UTF-16LE scheme
utf7engine
Deencoding engine for UTF-7
utf8engine

Statics§

LATIN1
Provided engine for Latin-1 / ISO-8859-1 / Codepage 1252.
MIXED816BE
Provided engine for a mixed UTF-8/UTF-16BE scheme.
MIXED816LE
Provided engine for a mixed UTF-8/UTF-16LE scheme.
UTF7
Provided engine for UTF-7.
UTF8
Provided engine for UTF-8.

Functions§

deencode
Build a DeencodeTree by successively running encodings and decodings through the engines.