Crate cleansh_core

Source
Expand description

§CleanSH Core Library

cleansh-core provides the fundamental, platform-independent logic for data sanitization and redaction. It defines the core data structures for redaction rules, provides mechanisms for compiling these rules, and implements a pluggable SanitizationEngine trait for applying redaction logic.

The library is designed to be pure and stateless, focusing solely on the transformation of input data based on defined rules, without concerns for I/O or application-specific state management.

§Modules

  • config: Defines RedactionRules and RedactionConfig for specifying sensitive patterns.
  • sanitizers: Contains engine-specific logic for compiling rules, such as regex_sanitizer.
  • validators: Provides programmatic validation for specific data types.
  • redaction_match: Defines data structures for detailed reporting of redaction events.
  • engine: Defines the SanitizationEngine trait, enabling a modular design.
  • profiles: Defines data structures for user-specified profiles and post-processing.
  • audit_log: Defines the structure and logic for writing redaction events to a log file.
  • engines: Contains concrete implementations of the SanitizationEngine trait.
  • headless: Convenience wrappers for using core engines in a non-interactive mode.

§Public API

The public API provides a cohesive set of types and functions for configuring and running a sanitization engine. Key components are organized by functionality:

Configuration & Rules

Sanitization Engine

  • SanitizationEngine: A trait for pluggable sanitization methods.
  • RegexEngine: The concrete implementation of SanitizationEngine that uses regular expressions.

Headless Mode

Redaction Reporting

Audit Logging

  • AuditLog: Provides a high-level API for creating and writing structured redaction logs.

§Usage Example

use cleansh_core::{RedactionConfig, headless_sanitize_string, EngineOptions};
use anyhow::Result;

fn main() -> Result<()> {
    // 1. Load default redaction rules.
    let default_config = RedactionConfig::load_default_rules()?;

    // 2. Prepare some content to sanitize.
    let input = "My email is test@example.com and my SSN is 123-45-6789. Another email: user@domain.org.";
    println!("\nOriginal Input:\n{}", input);

    // 3. Configure engine options.
    let options = EngineOptions::default();
    let source_id = "test_document.txt";

    // 4. Sanitize the content in a single, headless function call.
    let sanitized_output = headless_sanitize_string(
        default_config,
        options,
        input,
        source_id,
    )?;
    println!("\nSanitized Output:\n{}", sanitized_output);

    Ok(())
}

§Error Handling

The library uses anyhow::Error for fallible operations and defines specific error types like RuleConfigNotFoundError for clearer error reporting.

§Design Principles

  • Pluggable Architecture: The SanitizationEngine trait allows for different sanitization methods (e.g., regex, entropy) to be swapped out seamlessly.
  • Stateless: The core library does not maintain application state.
  • Testable: Logic is easily unit-testable in isolation.
  • Extensible: The design supports adding new rule types or engines with minimal changes to the core application logic.

License: BUSL-1.1

Re-exports§

pub use config::merge_rules;
pub use config::RedactionConfig;
pub use config::RedactionRule;
pub use config::RedactionSummaryItem;
pub use config::RuleConfigNotFoundError;
pub use config::MAX_PATTERN_LENGTH;
pub use errors::CleanshError;
pub use engine::SanitizationEngine;
pub use engines::regex_engine::RegexEngine;
pub use redaction_match::RedactionLog;
pub use redaction_match::RedactionMatch;
pub use redaction_match::redact_sensitive;
pub use profiles::apply_profile_to_config;
pub use profiles::compute_run_seed;
pub use profiles::DedupeConfig;
pub use profiles::EngineOptions;
pub use profiles::format_token;
pub use profiles::load_profile_by_name;
pub use profiles::PostProcessingConfig;
pub use profiles::ProfileConfig;
pub use profiles::ProfileRule;
pub use profiles::profile_candidate_paths;
pub use profiles::ReportingConfig;
pub use profiles::SamplesConfig;
pub use profiles::sample_score_hex;
pub use profiles::select_samples_for_rule;
pub use audit_log::AuditLog;
pub use headless::headless_sanitize_string;
pub use sanitizers::compiler::compile_rules;
pub use sanitizers::compiler::CompiledRule;
pub use sanitizers::compiler::CompiledRules;

Modules§

audit_log
audit_log.rs - Handles the creation and management of a secure, append-only audit log for all redaction events.
config
Configuration management for CleanSH-core, including data structures for redaction rules and methods for loading and merging rule sets.
engine
Defines the core SanitizationEngine trait and related data structures.
engines
This module contains different sanitization engine implementations.
errors
errors.rs - Custom error types for the cleansh-core library.
headless
headless.rs Convenience wrappers for using core engines in headless mode (non-UI). Provides helper functions for a full, one-shot sanitization of strings.
profiles
profiles.rs - Profile configuration, loading, and helpers for CleanSH.
redaction_match
Provides core data structures and utility functions for managing redaction matches and sensitive data logging within the cleansh-core library.
sanitizers
Core regex sanitization engine for CleanSH.
validators
Programmatic validation functions for specific sensitive data types.