rustrict
rustrict
is a sophisticated profanity filter for Rust.
Features
- Multiple types (profane, offensive, sexual, mean, spam)
- Multiple levels (mild, moderate, severe)
- Resistant to evasion
- Alternative spellings (like "fck")
- Repeated characters (like "craaaap")
- Confusable characters (like 'ᑭ' vs 'P')
- Spacing (like "c r_a-p")
- Accents (like "pÓöp")
- Bidirectional Unicode (related reading)
- Self-censoring (like "f*ck")
- Safe phrase list for known bad actors
- Battle-tested in Mk48.io
- Resistant to false positives
- One word (like "assassin")
- Two words (like "push it")
- Flexible
- Censor and/or analyze
- Input
&str
orIterator<Type = char>
- Can add words with the
customize
feature - Plenty of options
- Performant
- O(n) analysis and censoring
- No
regex
(uses custom radix trie) - 4 MB/s in
release
mode - 150 KB/s in
debug
mode
Limitations
- English only
- Censoring removes diacritics (accents)
- Does not detect right-to-left profanity while analyzing, so...
- Censoring forces Unicode to be left-to-right
- Doesn't understand context
- Not resistant to false positives affecting profanities added at runtime
Usage
Strings (&str
)
use CensorStr;
let censored: String = "hello crap".censor;
let inappropriate: bool = "f u c k".is_inappropriate;
assert_eq!;
assert!;
Iterators (Iterator<Type = char>
)
use CensorIter;
let censored: String = "hello crap".chars.censor.collect;
assert_eq!
Advanced
By constructing a Censor
, one can avoid scanning text multiple times to get a censored String
and/or
answer multiple is
queries. This also opens up more customization options (defaults are below).
use ;
let = from_str
.with_censor_threshold
.with_censor_first_character_threshold
.with_ignore_false_positives
.with_ignore_self_censoring
.with_censor_replacement
.censor_and_analyze;
assert_eq!;
assert!;
assert!;
If you cannot afford to let anything slip though, or have reason to believe a particular user is trying to evade the filter, you can check if their input matches a short list of safe strings:
use ;
assert!;
assert!;
assert!;
assert!;
assert!;
assert!;
If you want to add custom profanities, safe words, or characters to strip out, enable the "customize" feature.
Comparison
To compare filters, the first 100,000 items of this list is used as a dataset. Positive accuracy is the percentage of profanity detected as profanity. Negative accuracy is the percentage of clean text detected as clean.
Crate | Accuracy | Positive Accuracy | Negative Accuracy | Time |
---|---|---|---|---|
rustrict | 90.85% | 91.54% | 90.67% | 8s |
censor | 76.16% | 72.76% | 77.01% | 23s |
Development
If you make an adjustment that would affect false positives, you will need to run false_positive_finder
:
- Run
./download.sh
to get the required word lists. - Run
cargo run --bin false_positive_finder --release --all-features
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.