sensitive-rs 0.5.0

A Rust library for sensitive data detection and filtering, supporting Chinese and English text with trie-based algorithms.
Documentation

Sensitive-rs

English | 简体中文

Build crates.io docs.rs License Downloads

A high-performance Rust crate for multi-pattern string matching, validation, filtering, and replacement.

Features

  • Find all sensitive words: find_all
  • Validate text contains sensitive words: validate
  • Remove sensitive words: filter
  • Replace sensitive words with a character: replace
  • Multi-algorithm engine: Aho-Corasick, Wu-Manber, Regex
  • Noise removal via configurable regex
  • Variant detection (拼音、形似字)
  • Parallel search with rayon
  • LRU cache for hot queries
  • Batch processing: find_all_batch
  • Layered matching: find_all_layered
  • Streaming processing: find_all_streaming

Installation

Add to your Cargo.toml:

[dependencies]
sensitive-rs = "0.5.0"

Quick Start

use sensitive_rs::Filter;

fn main() {
    let mut filter = Filter::new();
    filter.add_words(&["rust", "filter", "敏感词"]);

    let text = "hello rust, this is a filter demo 包含敏感词";
    let found = filter.find_all(text);
    println!("Found: {:?}", found);

    let cleaned = filter.replace(text, '*');
    println!("Cleaned: {}", cleaned);
}

Advanced Usage

Batch processing:

let texts = vec!["text1", "text2"];
let results = filter.find_all_batch( & texts);

Layered matching:

let layered = filter.find_all_layered("some long text");

Streaming large files:

use std::fs::File;
use std::io::BufReader;

let reader = BufReader::new(File::open("large.txt") ? );
let stream_results = filter.find_all_streaming(reader) ?;

Documentation

For detailed documentation, please refer to Documentation.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 or MIT license, shall be dual licensed as above, without any additional terms or conditions.