Crate text_sanitizer
source ·Expand description
Converts raw text bytes into valid UTF-8 std::str::String
with simplyfied
ASCII Characters
For example Unicode Symbol Sparkling Heart “U+1F496” will be converted to “ <3 “. Emoji Sparkling Heart
The conversion relies on parsing the bytes into unicode codepoint strings which then are mapped with a conversion map to simplyfied ASCII Characters.
The conversion map helps also to rescue unrecognized bytes with custom mappings. So, a wrongly encoded byte like “(?80)” can be mapped to “EUR” which correctly encoded should be “U+20AC”
Re-exports
pub use sanitizer::ConversionMap;
pub use sanitizer::LanguageMap;
pub use sanitizer::TextSanitizer;