Skip to main content

Crate sanitize_engine

Crate sanitize_engine 

Source
Expand description

§sanitize-engine

Deterministic, one-way data sanitization engine.

This crate provides the core replacement infrastructure for replacing sensitive values with category-aware, deterministic substitutes. Replacements are one-way only — there is no key file, mapping table, or restore mode. It is the foundation layer consumed by higher-level streaming and CLI components.

§Key Components

  • category::Category — Classification of sensitive values (email, IP, name, etc.) that determines replacement format.
  • generator::ReplacementGenerator — Trait abstracting replacement strategy (HMAC-deterministic or CSPRNG-random).
  • strategy::Strategy — Pluggable replacement strategies that can be called directly without any mapping table.
  • store::MappingStore — Optional thread-safe per-run dedup cache ensuring the same input always maps to the same output within a run.
  • scanner::StreamScanner — Streaming regex scanner with chunk + overlap for bounded-memory processing.

§Concurrency Model

The MappingStore uses DashMap (shard-level locking) for the forward dedup cache. All types are Send + Sync.

§Stability

This crate is pre-1.0. The core guarantees — one-way replacement, deterministic mode, and length preservation — are stable. Processor heuristics, default limits, and report schema may evolve across minor versions.

§Example: Store-Level Replacement

use sanitize_engine::category::Category;
use sanitize_engine::generator::HmacGenerator;
use sanitize_engine::store::MappingStore;
use std::sync::Arc;

// Create a deterministic generator with a fixed seed.
let generator = Arc::new(HmacGenerator::new([42u8; 32]));

// Create the replacement store (optional capacity limit).
let store = MappingStore::new(generator, None);

// Sanitize a value (one-way).
let sanitized = store.get_or_insert(&Category::Email, "alice@corp.com").unwrap();
assert!(sanitized.contains("@corp.com"));
assert_eq!(sanitized.len(), "alice@corp.com".len());

// Same input → same output (per-run consistency).
let again = store.get_or_insert(&Category::Email, "alice@corp.com").unwrap();
assert_eq!(sanitized, again);

§Example: Streaming Scanner

use sanitize_engine::category::Category;
use sanitize_engine::generator::HmacGenerator;
use sanitize_engine::scanner::{ScanConfig, ScanPattern, StreamScanner};
use sanitize_engine::store::MappingStore;
use std::sync::Arc;

// Build patterns.
let patterns = vec![
    ScanPattern::from_regex(r"alice@corp\.com", Category::Email, "alice_email").unwrap(),
];

// Store with deterministic generator.
let generator = Arc::new(HmacGenerator::new([42u8; 32]));
let store = Arc::new(MappingStore::new(generator, Some(1_000_000)));

// Scanner with default chunk config.
let config = ScanConfig::new(1_048_576, 4096);
let scanner = StreamScanner::new(patterns, store, config).unwrap();

// Scan bytes in-memory.
let input = b"Contact alice@corp.com for details.";
let (output, stats) = scanner.scan_bytes(input).unwrap();

assert_eq!(stats.replacements_applied, 1);
assert_eq!(output.len(), input.len());

Re-exports§

pub use atomic::atomic_write;
pub use atomic::AtomicFileWriter;
pub use category::Category;
pub use error::Result;
pub use error::SanitizeError;
pub use generator::HmacGenerator;
pub use generator::RandomGenerator;
pub use generator::ReplacementGenerator;
pub use processor::archive::ArchiveFormat;
pub use processor::archive::ArchiveProcessor;
pub use processor::archive::ArchiveStats;
pub use processor::archive::DEFAULT_MAX_ARCHIVE_DEPTH;
pub use processor::FieldRule;
pub use processor::FileTypeProfile;
pub use processor::Processor;
pub use processor::ProcessorRegistry;
pub use report::FileReport;
pub use report::ReportBuilder;
pub use report::ReportMetadata;
pub use report::SanitizeReport;
pub use scanner::ScanConfig;
pub use scanner::ScanPattern;
pub use scanner::ScanStats;
pub use scanner::StreamScanner;
pub use secrets::decrypt_secrets;
pub use secrets::encrypt_secrets;
pub use secrets::load_secrets_auto;
pub use secrets::looks_encrypted;
pub use secrets::SecretEntry;
pub use secrets::SecretsFormat;
pub use store::MappingStore;
pub use strategy::EntropyMode;
pub use strategy::FakeIp;
pub use strategy::HmacHash;
pub use strategy::PreserveLength;
pub use strategy::RandomString;
pub use strategy::RandomUuid;
pub use strategy::Strategy;
pub use strategy::StrategyGenerator;

Modules§

atomic
Atomic file writes for crash-safe output.
category
Data category types for classifying sensitive values.
error
Unified error types for the sanitization engine.
generator
Replacement generation strategies.
processor
Structured processors for format-aware sanitization.
report
Structured reporting for sanitization runs.
scanner
Streaming scanner for detecting and replacing sensitive data.
secrets
Encrypted secrets management.
store
Thread-safe, concurrent one-way replacement store.
strategy
Pluggable replacement strategies.