1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
//! # sanitize-engine
//!
//! Deterministic, one-way data sanitization engine.
//!
//! This crate provides the core replacement infrastructure for replacing
//! sensitive values with category-aware, deterministic substitutes.
//! Replacements are **one-way only** — there is no key file, mapping
//! table, or restore mode. It is the foundation layer consumed by
//! higher-level streaming and CLI components.
//!
//! ## Key Components
//!
//! - [`category::Category`] — Classification of sensitive values (email,
//! IP, name, etc.) that determines replacement format.
//! - [`generator::ReplacementGenerator`] — Trait abstracting replacement
//! strategy (HMAC-deterministic or CSPRNG-random).
//! - [`strategy::Strategy`] — Pluggable replacement strategies that can
//! be called **directly** without any mapping table.
//! - [`store::MappingStore`] — Optional thread-safe per-run dedup cache
//! ensuring the same input always maps to the same output within a run.
//! - [`scanner::StreamScanner`] — Streaming regex scanner with chunk +
//! overlap for bounded-memory processing.
//!
//! ## Concurrency Model
//!
//! The `MappingStore` uses `DashMap` (shard-level locking) for the forward
//! dedup cache. All types are `Send + Sync`.
//!
//! ## Stability
//!
//! As of 0.8.0 the public API is considered stable and follows Semantic Versioning.
//! Breaking changes require a major version bump. The core guarantees —
//! one-way replacement, deterministic mode, and length preservation — are
//! stable across all 1.x releases. Processor heuristics, default limit
//! values, and report schema may change in minor releases (additive only).
//!
//! ## Example: Store-Level Replacement
//!
//! ```rust
//! use sanitize_engine::category::Category;
//! use sanitize_engine::generator::HmacGenerator;
//! use sanitize_engine::store::MappingStore;
//! use std::sync::Arc;
//!
//! // Create a deterministic generator with a fixed seed.
//! let generator = Arc::new(HmacGenerator::new([42u8; 32]));
//!
//! // Create the replacement store (optional capacity limit).
//! let store = MappingStore::new(generator, None);
//!
//! // Sanitize a value (one-way).
//! let sanitized = store.get_or_insert(&Category::Email, "alice@corp.com").unwrap();
//! assert!(sanitized.contains("@corp.com"));
//! assert_eq!(sanitized.len(), "alice@corp.com".len());
//!
//! // Same input → same output (per-run consistency).
//! let again = store.get_or_insert(&Category::Email, "alice@corp.com").unwrap();
//! assert_eq!(sanitized, again);
//! ```
//!
//! ## Example: Streaming Scanner
//!
//! ```rust
//! use sanitize_engine::category::Category;
//! use sanitize_engine::generator::HmacGenerator;
//! use sanitize_engine::scanner::{ScanConfig, ScanPattern, StreamScanner};
//! use sanitize_engine::store::MappingStore;
//! use std::sync::Arc;
//!
//! // Build patterns.
//! let patterns = vec![
//! ScanPattern::from_regex(r"alice@corp\.com", Category::Email, "alice_email").unwrap(),
//! ];
//!
//! // Store with deterministic generator.
//! let generator = Arc::new(HmacGenerator::new([42u8; 32]));
//! let store = Arc::new(MappingStore::new(generator, Some(1_000_000)));
//!
//! // Scanner with default chunk config.
//! let config = ScanConfig::new(1_048_576, 4096);
//! let scanner = StreamScanner::new(patterns, store, config).unwrap();
//!
//! // Scan bytes in-memory.
//! let input = b"Contact alice@corp.com for details.";
//! let (output, stats) = scanner.scan_bytes(input).unwrap();
//!
//! assert_eq!(stats.replacements_applied, 1);
//! assert_eq!(output.len(), input.len());
//! ```
//!
//! ## Example: Log Context Extraction
//!
//! After sanitizing, scan the output for error/warning keywords and capture
//! surrounding lines for LLM-friendly triage:
//!
//! ```rust
//! use sanitize_engine::log_context::{extract_context, LogContextConfig};
//!
//! let sanitized = "INFO request received\n\
//! ERROR disk full on /dev/sda1\n\
//! INFO retrying mount\n\
//! WARN filesystem degraded\n\
//! INFO recovery complete";
//!
//! let config = LogContextConfig::new().with_context_lines(1);
//! let result = extract_context(sanitized, &config);
//!
//! // Two keyword hits: "error" and "warn".
//! assert_eq!(result.match_count, 2);
//!
//! // First match: ERROR line with one line of context on each side.
//! assert_eq!(result.matches[0].keyword, "error");
//! assert_eq!(result.matches[0].before, vec!["INFO request received"]);
//! assert_eq!(result.matches[0].after, vec!["INFO retrying mount"]);
//! ```
// Crate-level lint configuration.
// Allow specific pedantic lints that are too noisy for this crate.
// Re-exports for convenience.
pub use ;
pub use Category;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use DEFAULT_ARCHIVE_DEPTH;
pub use ;
pub use ;
pub use ;
pub use ;
pub use MappingStore;
pub use ;
pub use strip_values_from_text;