Deencode: Reverse engineer encoding errors
My first name is Clément. Throughout my life, I've encountered my fair share of bad printings of my name because of bad encoding management: the text is encoded (turned from an internal representation into a sequence of bytes) then decoded (turned from a sequence of bytes into an internal representation) using different schemes. This often leads to non-ASCII characters being mangled, replaced, or outright ignored.
For example:
The string "Clément"
└╴encoded as UTF-8 is 43 6C C3 A9 6D 65 6E 74
└╴decoded as Latin-1 / Codepage 1252 is "Clément"
Having this sort of visualisations is why I created this crate. You take a
number of
engines,
pass them to
deencode::deencode()
to get back a
tree
of possible sequences of encodings and decodings, and then work on that tree.
This crate is published on crates.io; with documentation at docs.rs.
Example usage
// List the engines to use.
let engines: = vec!;
// Explore the tree of possible encodings and decodings.
let mut tree = deencode;
// Remove duplicate entries from the tree.
let _ = tree.deduplicate;
// Export the tree with box drawings.
println!;
// Export the tree as JSON.
println!;