Expand description
§CleanSH Core Library
cleansh-core
provides the fundamental, platform-independent logic for data sanitization
and redaction. It defines the core data structures for redaction rules, provides mechanisms
for compiling these rules, and implements a pluggable SanitizationEngine
trait for
applying redaction logic.
The library is designed to be pure and stateless, focusing solely on the transformation of input data based on defined rules, without concerns for I/O or application-specific state management.
§Modules
config
: DefinesRedactionRule
s andRedactionConfig
for specifying sensitive patterns.sanitizers
: Contains engine-specific logic for compiling rules, such asregex_sanitizer
.validators
: Provides programmatic validation for specific data types.redaction_match
: Defines data structures for detailed reporting of redaction events.engine
: Defines theSanitizationEngine
trait, enabling a modular design.profiles
: Defines data structures for user-specified profiles and post-processing.audit_log
: Defines the structure and logic for writing redaction events to a log file.engines
: Contains concrete implementations of theSanitizationEngine
trait.headless
: Convenience wrappers for using core engines in a non-interactive mode.
§Public API
The public API provides a cohesive set of types and functions for configuring and running a sanitization engine. Key components are organized by functionality:
Configuration & Rules
RedactionConfig
: Manages collections ofRedactionRule
s, including loading, merging, and filtering.RedactionRule
: Defines a single rule for identifying and replacing sensitive patterns.merge_rules
: Merges default and user-defined configurations.RedactionConfig::load_from_file
: Loads rules from a YAML file.RedactionConfig::load_default_rules
: Loads the built-in set of default rules.
Sanitization Engine
SanitizationEngine
: A trait for pluggable sanitization methods.RegexEngine
: The concrete implementation ofSanitizationEngine
that uses regular expressions.
Headless Mode
headless_sanitize_string
: A convenience function for a full, one-shot sanitization.
Redaction Reporting
RedactionMatch
: A detailed record of a single matched and redacted item, including its location.RedactionSummaryItem
: A summary of all matches for a specific rule.
Audit Logging
AuditLog
: Provides a high-level API for creating and writing structured redaction logs.
§Usage Example
use cleansh_core::{RedactionConfig, headless_sanitize_string, EngineOptions};
use anyhow::Result;
fn main() -> Result<()> {
// 1. Load default redaction rules.
let default_config = RedactionConfig::load_default_rules()?;
// 2. Prepare some content to sanitize.
let input = "My email is test@example.com and my SSN is 123-45-6789. Another email: user@domain.org.";
println!("\nOriginal Input:\n{}", input);
// 3. Configure engine options.
let options = EngineOptions::default();
let source_id = "test_document.txt";
// 4. Sanitize the content in a single, headless function call.
let sanitized_output = headless_sanitize_string(
default_config,
options,
input,
source_id,
)?;
println!("\nSanitized Output:\n{}", sanitized_output);
Ok(())
}
§Error Handling
The library uses anyhow::Error
for fallible operations and defines specific error
types like RuleConfigNotFoundError
for clearer error reporting.
§Design Principles
- Pluggable Architecture: The
SanitizationEngine
trait allows for different sanitization methods (e.g., regex, entropy) to be swapped out seamlessly. - Stateless: The core library does not maintain application state.
- Testable: Logic is easily unit-testable in isolation.
- Extensible: The design supports adding new rule types or engines with minimal changes to the core application logic.
License: BUSL-1.1
Re-exports§
pub use config::merge_rules;
pub use config::RedactionConfig;
pub use config::RedactionRule;
pub use config::RedactionSummaryItem;
pub use config::RuleConfigNotFoundError;
pub use config::MAX_PATTERN_LENGTH;
pub use errors::CleanshError;
pub use engine::SanitizationEngine;
pub use engines::regex_engine::RegexEngine;
pub use redaction_match::RedactionLog;
pub use redaction_match::RedactionMatch;
pub use redaction_match::redact_sensitive;
pub use profiles::apply_profile_to_config;
pub use profiles::compute_run_seed;
pub use profiles::DedupeConfig;
pub use profiles::EngineOptions;
pub use profiles::format_token;
pub use profiles::load_profile_by_name;
pub use profiles::PostProcessingConfig;
pub use profiles::ProfileConfig;
pub use profiles::ProfileRule;
pub use profiles::profile_candidate_paths;
pub use profiles::ReportingConfig;
pub use profiles::SamplesConfig;
pub use profiles::sample_score_hex;
pub use profiles::select_samples_for_rule;
pub use audit_log::AuditLog;
pub use headless::headless_sanitize_string;
pub use sanitizers::compiler::compile_rules;
pub use sanitizers::compiler::CompiledRule;
pub use sanitizers::compiler::CompiledRules;
Modules§
- audit_
log - audit_log.rs - Handles the creation and management of a secure, append-only audit log for all redaction events.
- config
- Configuration management for
CleanSH-core
, including data structures for redaction rules and methods for loading and merging rule sets. - engine
- Defines the core SanitizationEngine trait and related data structures.
- engines
- This module contains different sanitization engine implementations.
- errors
- errors.rs - Custom error types for the cleansh-core library.
- headless
headless.rs
Convenience wrappers for using core engines in headless mode (non-UI). Provides helper functions for a full, one-shot sanitization of strings.- profiles
- profiles.rs - Profile configuration, loading, and helpers for CleanSH.
- redaction_
match - Provides core data structures and utility functions for managing redaction matches
and sensitive data logging within the
cleansh-core
library. - sanitizers
- Core regex sanitization engine for CleanSH.
- validators
- Programmatic validation functions for specific sensitive data types.