decancer

A library that removes common unicode confusables/homoglyphs from strings.
- Its core is written in Rust and utilizes a form of Binary Search to ensure speed!
- By default, it's capable of filtering 221,529 (19.88%) different unicode codepoints like:
- All whitespace characters
- All diacritics, this also eliminates all forms of Zalgo text
- Most leetspeak characters
- Most homoglyphs
- Several emojis
- Unlike other packages, this package is unicode bidi-aware where it also interprets right-to-left characters in the same way as it were to be rendered by an application!
- Its behavior is also highly customizable to your liking!
Installation
In your Cargo.toml
:
= "3.3.3"
Examples
For more information, please read the documentation.
let mut cured = cure!.unwrap;
assert_eq!;
// WARNING: it's NOT recommended to coerce this output to a Rust string
// and process it manually from there, as decancer has its own
// custom comparison measures, including leetspeak matching!
assert_ne!;
assert!;
cured.censor;
assert_eq!;
cured.censor_multiple;
assert_eq!;
Donations
If you want to support my eyes for manually looking at thousands of unicode characters, consider donating! β€