Crate text_sanitizer

source ·
Expand description

Converts raw text bytes into valid UTF-8 std::str::String with simplyfied ASCII Characters

For example Unicode Symbol Sparkling Heart “U+1F496” will be converted to “ <3 “. Emoji Sparkling Heart

The conversion relies on parsing the bytes into unicode codepoint strings which then are mapped with a conversion map to simplyfied ASCII Characters.

The conversion map helps also to rescue unrecognized bytes with custom mappings. So, a wrongly encoded byte like “(?80)” can be mapped to “EUR” which correctly encoded should be “U+20AC”

Re-exports

Modules