Expand description
§decancer
A library that removes common unicode confusables/homoglyphs from strings.
- Its core is written in Rust and utilizes a form of Binary Search to ensure speed!
- By default, it’s capable of filtering 221,529 (19.88%) different unicode codepoints like:
- All whitespace characters
- All diacritics, this also eliminates all forms of Zalgo text
- Most leetspeak characters
- Most homoglyphs
- Several emojis
- Unlike other packages, this package is unicode bidi-aware where it also interprets right-to-left characters in the same way as it were to be rendered by an application!
- Its behavior is also highly customizable to your liking!
§Installation
In your Cargo.toml
:
decancer = "3.2.4"
§Examples
For more information, please read the documentation.
let mut cured = decancer::cure!(r"vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣 wWiIiIIttHh l133t5p3/-\|<").unwrap();
assert_eq!(cured, "very funny text with leetspeak");
// WARNING: it's NOT recommended to coerce this output to a Rust string
// and process it manually from there, as decancer has its own
// custom comparison measures, including leetspeak matching!
assert_ne!(cured.as_str(), "very funny text with leetspeak");
assert!(cured.contains("funny"));
cured.censor("funny", '*');
assert_eq!(cured, "very ***** text with leetspeak");
cured.censor_multiple(["very", "text"], '-');
assert_eq!(cured, "---- ***** ---- with leetspeak");
§Donations
If you want to support my eyes for manually looking at thousands of unicode characters, consider donating! ❤
Macros§
- Cures a string with decancer’s default options.
- Cures a single character/unicode codepoint with decancer’s default options.
Structs§
- A small wrapper around the
String
data type for comparison purposes. - A matcher iterator around a string that yields a non-inclusive
Range
whenever it detects a similar match. - A configuration struct where you can customize decancer’s behavior.
Enums§
- An error enum for unicode bidi errors caused by malformed string inputs.
- The translation for a single character/codepoint.